我一直试图在python中实现以下plyr链:
# Data
data_L1
X Y r2 contact_id acknowledge_issues
a c 100 xyzx 0
b d 100 fsdjkfl 0
a c 80 ejrkl 20
b d 60 fdsdl 40
b d 80 gsdkf 20
# Transformation
test <- ddply(data_L1,
.(X,Y),
summarize,
avg_r2 = mean(r2),
tickets = length(unique(contact_id)),
er_ai =length(acknowledge_issues[which(acknowledge_issues>0)])/length(acknowledge_issues)
)
# Output
test
X Y avg_r2 tickets er_ai
a c 90 2 0.5
b d 80 3 0.6667然而,我只在python中走了这么远:
test = data_L1.groupby(['X','Y']).agg({'r2': 'mean', 'contact_id' : 'count'})我不知道如何在Python中创建变量er_ai。你对熊猫或其他图书馆的解决方案有什么建议吗?
发布于 2017-10-22 12:20:30
使用count函数nunique和er_ai按条件获取所有值的mean:
cols = {'r2':'avg_r2', 'contact_id':'tickets', 'acknowledge_issues':'er_ai'}
test = (data_L1.groupby(['X','Y'], as_index=False)
.agg({'r2': 'mean',
'contact_id' : 'nunique',
'acknowledge_issues': lambda x: (x>0).mean()})
.rename(columns=cols))
print (test)
X Y tickets er_ai avg_r2
0 a c 2 0.500000 90
1 b d 3 0.666667 80https://stackoverflow.com/questions/46873846
复制相似问题