我有两个数据帧:
test1 = pd.DataFrame({'Gene':['WASH7P', 'WASH7P', 'VCZ'], 'TPM':[10.034, 0.234000, 2.345]})
test2 = pd.DataFrame({'Gene':['WASH7P', 'WASH7P', 'btt'], 'TPM':[1.12345, 2.300, 0.00000]})我想将它们合并到一个数据帧中。我试过了:
df = pd.merge(test1,test2, on = ['Gene'],how = 'outer')结果是:
Gene TPM_x TPM_y
0 WASH7P 10.034 1.12345
1 WASH7P 10.034 2.30000
2 WASH7P 0.234 1.12345
3 WASH7P 0.234 2.30000
4 VCZ 2.345 NaN
5 btt NaN 0.00000但是,也有重复的行。我尝试过drop_duplicates(),但这不起作用。实际数据帧要大得多,具有> 30,000行。
所需的输出:
Gene TPM_x TPM_y
WASH7P 10.034 1.12345
WASH7P 0.234 2.30000
VCZ 2.345 NaN
btt NaN 0.00000任何帮助都是最好的。
发布于 2021-02-20 00:47:45
如果您正在尝试删除基于列"TPM_x“的重复项
使用这个
df = pd.merge(test1,test2, on = ['Gene'],how = 'outer').drop_duplicates(keep="first", subset = 'TPM_x')https://stackoverflow.com/questions/66281579
复制相似问题