我需要在类似但不完全相同的列上加入数据帧。幸运的是,小写字母在列之间是相同的。因此,我试图将小写字母从每一列中分离出来,创建要连接的新列。
df1 = pd.DataFrame({'alpha': ['1', '2', '3'],
'beta': ['JRLeparoux', 'BJHernandez,Jr.','SXBridgmohan'],})
df2 = pd.DataFrame({'alpha': ['1', '2', '3'],
'gamma': ['Leparoux R', 'Hernandez,B, Jr.','Bridgmohan S X'],
'zeta': ['17', '23','116'],}) 这就是我试过的
def joinnames(df):
filelist = []
for c in df:
if c.islower():
filelist.append(c)
return filelist
df1['joinhere'] = df1['beta'].apply(joinnames)
df2['joinhere'] = df2['gamma'].apply(joinnames)
pd.merge(df1,df2, how ='left', left_on = 'joinhere', right_on = 'joinhere' )这就是我想要达到的结果。
final = pd.DataFrame({'alpha': ['1', '2', '3'],
'gamma': ['Leparoux R', 'Hernandez,B, Jr.','Bridgmohan S X'],
'beta': ['JRLeparoux', 'BJHernandez,Jr.','SXBridgmohan'],
'zeta': ['17', '23','116'],})发布于 2015-10-27 15:24:38
您可以使用Series.str.extract查找小写字母:
import pandas as pd
df1 = pd.DataFrame({'alpha': ['1', '2', '3'],
'beta': ['JRLeparoux', 'BJHernandez,Jr.','SXBridgmohan'],})
df2 = pd.DataFrame({'alpha': ['1', '2', '3'],
'gamma': ['Leparoux R', 'Hernandez,B, Jr.','Bridgmohan S X'],
'zeta': ['17', '23','116'],})
df1['lower'] = df1['beta'].str.extract(r'([a-z]+)')
df2['lower'] = df2['gamma'].str.extract(r'([a-z]+)')
final = pd.merge(df1, df2)
print(final)收益率
alpha beta lower gamma zeta
0 1 JRLeparoux eparoux Leparoux R 17
1 2 BJHernandez,Jr. ernandez Hernandez,B, Jr. 23
2 3 SXBridgmohan ridgmohan Bridgmohan S X 116请注意,这假设将从a到z的所有ASCII字符进行收集,以生成要在其上加入的值。如果beta和gamma列包含非ASCII小写字符(例如带有重音标记的字符),则可能需要将这些字符添加到regex字符类[a-z]中。
https://stackoverflow.com/questions/33371752
复制相似问题