我有一个包含如下数据的CSV:
date,datetime,year,month,date,value,name
20170430,2017-04-30 18:30:00,2017,04,30,NaN,A1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,A1
20170430,2018-02-07 18:30:00,2018,02,07,1.23,B1
20170501,2017-07-10 18:30:00,2017,07,10,42.2,C1
20170430,2017-04-30 18:30:00,2017,04,30,32.1,C1我需要如下所示的结果,即A1、B1、C1对应的值应该分离为一个单独的列:
date,datetime,year,month,date,A1,B1,C1
20170430,2017-04-30 18:30:00,2017,04,30,NaN,1.23,32.1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,NaN,42.2我尝试使用python pandas pivot方法,使用索引作为日期,列作为名称,但得到如下错误,这是意料之中的,因为A1和C1有多个条目
ValueError: Index contains duplicate entries, cannot reshape
import pandas as pd
df = pd.read_csv("D:/datagenicAPI/finalCSV.csv")
print(df)
df1 = df.pivot(index="date", columns="name")
df1.to_csv("d:/datagenicAPI/test1.csv", sep=",")我需要分离成单独的列,我能知道如何用python pandas实现同样的分离吗?
发布于 2018-05-30 23:10:48
加载示例df:
import io
import pandas as pd
s = """
date,datetime,year,month,date,value,name
20170430,2017-04-30 18:30:00,2017,04,30,NaN,A1
20170501,2017-05-01 18:30:00,2017,05,01,121.2,A1
20170430,2018-02-07 18:30:00,2018,02,07,1.23,B1
20170501,2017-07-10 18:30:00,2017,07,10,42.2,C1
20170430,2017-04-30 18:30:00,2017,04,30,32.1,C1
"""
df = pd.read_csv(io.StringIO(s))使用pivot_table和reset_index,您将获得:
df.pivot_table('value', ['date', 'datetime', 'year', 'month', 'date.1'], 'name').reset_index()
name date datetime year month date.1 A1 B1 C1
0 20170430 2017-04-30 18:30:00 2017 4 30 NaN NaN 32.1
1 20170430 2018-02-07 18:30:00 2018 2 7 NaN 1.23 NaN
2 20170501 2017-05-01 18:30:00 2017 5 1 121.2 NaN NaN
3 20170501 2017-07-10 18:30:00 2017 7 10 NaN NaN 42.2注意到df包含一个由pandas命名为'date.1‘的列,因为在您的示例中有两个名为'date’的列。
发布于 2018-05-30 22:26:31
我认为需要两个步骤,drop_duplicates + unstack,然后concat结果
s=df.drop_duplicates('date').iloc[:,:4]
pd.concat([s.set_index('date'),df.set_index(['date','name']).value.unstack()],axis=1)
Out[339]:
datetime year month A1 B1 C1
date
20170430 2017-04-30 18:30:00 2017 4 NaN 1.23 32.1
20170501 2017-05-01 18:30:00 2017 5 121.2 NaN 42.2https://stackoverflow.com/questions/50607058
复制相似问题