我有一个长格式的dataframe,表示一段时间内许多动物在各种条件下的值。现在,我想要应用装箱,以便在保持动物和条件分离的同时,对多个时间点的值进行平均。
我尝试了一系列冗长的unstack、groupby和stack操作,但我认为应该有一种简明的方法来做到这一点?
基本上,我想从左边的表格转到右边的表格:

import pandas as pd
import numpy as np
time=np.array([1,2,1,2,3,4,3,4,5,6,5,6,7,8,7,8])
animal=np.array([1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2])
condition=np.array(['a','b','a','b','a','b','a','b','a','b','a','b','a','b','a','b'])
val=np.random.random(16)
df=pd.DataFrame({'time':time,'animal':animal,'condition':condition,'val':val})发布于 2017-02-03 19:44:56
我认为你需要有groupby的cut
bins = [0, 4, 9]
labels=['1-4','5-8']
df['bin'] = pd.cut(df['time'], bins=bins, labels=labels)
print (df)
animal condition time val bin
0 1 a 1 0.394700 1-4
1 1 b 2 0.492167 1-4
2 2 a 1 0.402880 1-4
3 2 b 2 0.354298 1-4
4 1 a 3 0.500614 1-4
5 1 b 4 0.445177 1-4
6 2 a 3 0.090433 1-4
7 2 b 4 0.273563 1-4
8 1 a 5 0.943477 5-8
9 1 b 6 0.026545 5-8
10 2 a 5 0.039999 5-8
11 2 b 6 0.283140 5-8
12 1 a 7 0.582344 5-8
13 1 b 8 0.990893 5-8
14 2 a 7 0.992642 5-8
15 2 b 8 0.993117 5-8
print (df.groupby(['bin','animal','condition'], as_index=False).val.mean())
bin animal condition val
0 1-4 1 a 0.447657
1 1-4 1 b 0.468672
2 1-4 2 a 0.246657
3 1-4 2 b 0.313931
4 5-8 1 a 0.762911
5 5-8 1 b 0.508719
6 5-8 2 a 0.516320
7 5-8 2 b 0.638129不创建新列的解决方案:
print (df.groupby([pd.cut(df['time'],
bins=[0, 4, 9],
labels=['1-4','5-8']), 'animal','condition'])
.val.mean().reset_index())https://stackoverflow.com/questions/42023442
复制相似问题