这个问题特定于pandas.DataFrame
str、dict还是list类型。当df.dropna().reset_index(drop=True)不是有效选项时,
NaN值。H 212f 213/code>案例1
具有.json_normalize.类型列的
,在使用str之前,必须将列中的值转换为使用ast.literal_eval的dict类型。import numpy as np
import pandas as pd
from ast import literal_eval
df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]})
col_str
0 {"a": "46", "b": "3", "c": "12"}
1 {"b": "2", "c": "7"}
2 {"c": "11"}
3 NaN
type(df.iloc[0, 0])
[out]: str
df.col_str.apply(literal_eval)错误:
df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan案例2
具有dict类型的列,使用pandas.json_normalize将键转换为列标题,将值转换为行df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})
col_dict
0 {'a': '46', 'b': '3', 'c': '12'}
1 {'b': '2', 'c': '7'}
2 {'c': '11'}
3 NaN
type(df.iloc[0, 0])
[out]: dict
pd.json_normalize(df.col_dict)错误:
pd.json_normalize(df.col_dict) results in AttributeError: 'float' object has no attribute 'items'案例3
使用list.
To内的dict规范列应用literal_eval,因为#en2#不适用于str类型<代码>H 144爆炸列以分隔<代码>D45以分离column
df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]})
col_str
0 [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]
1 [{"b": "2", "c": "7"}, {"c": "11"}]
2 NaN
type(df.iloc[0, 0])
[out]: str
df.col_str.apply(literal_eval)错误:
df.col_str.apply(literal_eval) results in ValueError: malformed node or string: nan发布于 2020-09-13 23:59:19
在这里,对于虚拟数据,或者在处理其他列都不重要的数据时,总是有这样的选项:
df = df.dropna().reset_index(drop=True)python 3.10**,** pandas 1.4.3案例1
由于列包含str)类型,因此
'{}'填充the (一个'{}')。import numpy as np
import pandas as pd
from ast import literal_eval
df = pd.DataFrame({'col_str': ['{"a": "46", "b": "3", "c": "12"}', '{"b": "2", "c": "7"}', '{"c": "11"}', np.NaN]})
col_str
0 {"a": "46", "b": "3", "c": "12"}
1 {"b": "2", "c": "7"}
2 {"c": "11"}
3 NaN
type(df.iloc[0, 0])
[out]: str
# fillna
df.col_str = df.col_str.fillna('{}')
# convert the column to dicts
df.col_str = df.col_str.apply(literal_eval)
# use json_normalize
df = df.join(pd.json_normalize(df.pop('col_str')))
# display(df)
a b c
0 46 3 12
1 NaN 2 7
2 NaN NaN 11
3 NaN NaN NaN案例2
至少就pandas 1.3.4而言,pd.json_normalize(df.col_dict)工作时没有问题,至少对于这个简单的示例是这样的。
由于该列包含str)
{}填充to (不需要使用数据理解填充fillna({})不工作)。
df = pd.DataFrame({'col_dict': [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}, {"c": "11"}, np.NaN]})
col_dict
0 {'a': '46', 'b': '3', 'c': '12'}
1 {'b': '2', 'c': '7'}
2 {'c': '11'}
3 NaN
type(df.iloc[0, 0])
[out]: dict
# fillna
df.col_dict = df.col_dict.fillna({i: {} for i in df.index})
# use json_normalize
df = df.join(pd.json_normalize(df.pop('col_dict')))
# display(df)
a b c
0 46 3 12
1 NaN 2 7
2 NaN NaN 11
3 NaN NaN NaN案例3
str)
NaNs中填充'[]' (在列上可以使用literal_eval将dict值分隔到行NaNs需要填充{} (不是str)lists的D70,则跳到.explode.df = pd.DataFrame({'col_str': ['[{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]', '[{"b": "2", "c": "7"}, {"c": "11"}]', np.nan]})
col_str
0 [{"a": "46", "b": "3", "c": "12"}, {"b": "2", "c": "7"}]
1 [{"b": "2", "c": "7"}, {"c": "11"}]
2 NaN
type(df.iloc[0, 0])
[out]: str
# fillna
df.col_str = df.col_str.fillna('[]')
# literal_eval
df.col_str = df.col_str.apply(literal_eval)
# explode
df = df.explode('col_str', ignore_index=True)
# fillna again
df.col_str = df.col_str.fillna({i: {} for i in df.index})
# use json_normalize
df = df.join(pd.json_normalize(df.pop('col_str')))
# display(df)
a b c
0 46 3 12
1 NaN 2 7
2 NaN 2 7
3 NaN NaN 11
4 NaN NaN NaNhttps://stackoverflow.com/questions/63876637
复制相似问题