我有这样一个dataframe (创建一个示例dataframe);
df = pd.DataFrame({'language': ['ruby','ruby','ruby', np.nan,'ruby'],
'top_lang_owned': [['ruby', 'javascript', 'go'],
['ruby', 'coffeescript'],
['javascript', 'coffeescript'],
['ruby', 'shell', 'go'],
np.nan],
'top_lang_watched': [['ruby','go'],
['javascript'],
np.NaN,
['ruby', 'shell'],
np.nan]})
dflanguage top_lang_owned top_lang_watched 0 ruby ruby,javascript,go 1 ruby ruby,coffeescript 2 ruby javascript,coffeescript NaN 3 NaN ruby,shell,go 4 ruby NaN NaN
dataframe.info();RangeIndex: 5个条目,0到4个数据列(共2列):语言4非空对象top_lang_owned 4非空对象dtype:对象(2)内存使用: 208.0+字节
我想添加一个比较两个字段值的字段。(伪码)
if ("language" is in "top_lang_owned")
then new_field = 1 othervise new_field = 0.例如,所需的输出必须在以下;
语言top_lang_owned top_lang_watched is_owned is_watched 0 ruby ruby,javascript,go 1 1 1 ruby ruby,coffeescript 1 0 2 ruby javascript,coffeescript NaN 0 0 3 NaN ruby,shell,go NaN NaN 4 ruby NaN
发布于 2020-02-25 13:16:54
你当然可以这样做,这是你可能要尝试的代码,
编辑:
def func(x):
if x.language in x.top_lang_owned:
return 1
return 0
df['is_in_lang'] = df[~df.isna().any(1)].apply(func, axis=1)输出:
id language top_lang_owned is_in_lang
0 21 ruby [ruby, javascript, go] 1
1 25 ruby [javascript, ruby, coffeescript] 1
2 38 ruby [javascript, coffeescript] 0
3 108 NaN [ruby, shell, go] NaN
4 173 ruby NaN NaN发布于 2020-02-25 15:12:18
您可以筛选NA并应用以下条件:
df['is_in_lang'] = df[~df.isna().any(1)].apply(lambda x: 1 if x['language'] in x['top_lang_owned'] else 0, 1)
language top_lang_owned is_in_lang
0 ruby [ruby, javascript, go] 1.0
1 ruby [javascript, ruby, coffeescript] 1.0
2 ruby [javascript, coffeescript] 0.0
3 NaN [ruby, shell, go] NaN
4 ruby NaN NaNhttps://stackoverflow.com/questions/60395408
复制相似问题