我有一项作业,要求我计算几个班级学生的分数是否高于0.2,这是基于在每个有参考分数的班级中挑选一个或多个参考学生。
下面是数据框架示例
df = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
'score' : [1, .8, .3, .7, .7, .6, .1, .2, .1, .1]})
df该算法应包含以下规则
所以最终的结果是
df2 = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
'score' : [1, .8, .3, .7, .6, .6, .1, .2, .1, .1],
'outcome' : ['no', 'ref', 'yes', 'no', 'ref', 'ref', 'yes', 'yes', 'yes', 'yes']})
df2我对熊猫有一些基本知识,但我认为这个问题对我来说太复杂了。你对如何做这件事有什么想法吗?
发布于 2018-08-09 12:55:54
def final_output(df):
# groups class & type
groups = df2.groupby(['class', 'type'])
# cl will have key as 'Class' & value as 'reference student score'
cl = {}
for name,group in groups:
if 'top' in name[1]:
cl[name[0]] = group['score'].min()
elif 'mid' in name[1]:
cl[name[0]] = group['score'].min()
# Assigning reference student score to their respective class students
df['refer_score'] = df['class'].apply(lambda x: cl[x])
# difference being reference student score minus actual score of the student
df['diff'] = df.apply(lambda x: abs(x['refer_score'] - x['score']), axis=1)
df['final_outcome'] = df['diff'].apply(lambda x: 'yes' if x > 0.2 else 'ref' if x == 0.0 else 'no')
return df
output = final_output(df2)https://stackoverflow.com/questions/51766103
复制相似问题