例如,我的df包含以下数据:
部件,Res_val,res_date,id,mass,start,sec_id (列)及以下是值。
Part,res_value,res_date,id,sample_number,start,sec_id
ABC1,4,01/10/2022 15:15:15,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC2,2,01/07/2022 11:27:43,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC3,3,01/06/2022 08:12:39,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC1,4,01/06/2022 08:12:39,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC2,5,01/10/2022 15:15:14,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC3,2,01/11/2022 17:28:56,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC1,2,01/11/2022 17:28:56,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC1,4,01/10/2022 15:15:15,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC2,10,01/07/2022 11:27:43,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC3,3,01/06/2022 08:12:39,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC1,4,01/06/2022 08:12:39,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC2,5,01/10/2022 15:15:14,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC3,2,01/11/2022 17:28:56,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC1,2,01/11/2022 17:28:56,GKK122,1,2022-10-03 19:35:14,AHJ233结果:
ABC1_result,ABC2_result,res_date,id,sample_number,start,sec_id
4,2,01/10/2022 15:15:15,GKK123,1,2022-10-03 19:35:14,AHJ234
4,10,01/10/2022 15:15:15,GKK122,1,2022-10-03 19:35:14,AHJ233我想根据时间选择最近的行,我想检查sec_id是否包含ABC1或ABC2,然后我将分别对该行使用res_value,否则该列将为null。
因此输出应该类似于: ABC1_result、ABC2_result、res_date、id、sample_number、start、sec_id
4,2,01/10/2022 15:15:15,GKK123,1,2022-10-03 19:35:14,AHJ234
4,10,01/10/2022 15:15:15,GKK122,1,2022-10-03 19:35:14,AHJ233发布于 2022-10-03 12:06:18
如果需要测试Part,如果匹配列表,然后需要旋转使用:
L = ['ABC1','ABC3']
df1 = df[df['Part'].isin(L)].set_index(['Part'], append=True)['Res_val'].unstack().add_prefix('Res_val_')
df = pd.concat([df1, df.drop(['Part','Res_val'], axis=1)], axis=1)
print (df)
Res_val_ABC1 Res_val_ABC3 res_date id mass \
0 1.0 NaN 09/11/2002 16:14:12 A458 5
1 NaN NaN 05/04/2002 10:23:43 A352 6
2 NaN 6.0 05/06/2002 09:18:39 B425 7
start sec_id
0 2002-16-07 23:35:14 530HQ
1 2002-14-05 12:56:14 2345T
2 2002-14-03 12:12:56 25F2 如果需要按日期时间计算每个列表的最高值,请使用:
df['res_date'] = pd.to_datetime(df['res_date'], dayfirst=True)
L = ['ABC1','ABC2']
df1 = df[df['Part'].isin(L)].sort_values(['sec_id','res_date']).drop_duplicates(['Part','sec_id'])
print (df1)
Part res_value res_date id sample_number \
10 ABC1 4 2022-06-01 08:12:39 GKK122 1
8 ABC2 10 2022-07-01 11:27:43 GKK122 1
3 ABC1 4 2022-06-01 08:12:39 GKK123 1
1 ABC2 2 2022-07-01 11:27:43 GKK123 1
start sec_id
10 2022-10-03 19:35:14 AHJ233
8 2022-10-03 19:35:14 AHJ233
3 2022-10-03 19:35:14 AHJ234
1 2022-10-03 19:35:14 AHJ234 因为每个组的sec_id值是相同的,所以可以旋转如下:
df2 = (df1.pivot(index=['id','start','sample_number','sec_id'],
columns='Part',
values='res_value')
.add_suffix('_res_value')
.reset_index())
print (df2)
Part id start sample_number sec_id ABC1_res_value \
0 GKK122 2022-10-03 19:35:14 1 AHJ233 4
1 GKK123 2022-10-03 19:35:14 1 AHJ234 4
Part ABC2_res_value
0 10
1 2 https://stackoverflow.com/questions/73934993
复制相似问题