我有一个解决方案,但是它使用了一个for循环,我正在寻找一种更好/更优雅的方法来交换Col1和Col2值,当Col1 > Col2时。
当前解决方案:在2 for循环中:查找Col1 > Col2中的行,然后添加行Col2 Col1,然后删除Col1 > Col2的所有行。这只花了两行。是否有更好的方法来交换Col1和Col2值?
import pandas as pd
def drop_all_revd_in_df(df):
indexNames = df[ (df['col1'] > df['col2'])].index
df.drop(indexNames , inplace=True)
return df
# for loops to check if col1 > col2 and reverse order
def col1GTcol2CleanUp(df):
col1A_prev = ''
for col1A in df['col1'] :
if col1A != col1A_prev :
col1A_prev = col1A
for col1B in df[df['col1'] == col1A]['col2']:
if (col1A > col1B) :
score = (df[ (df['col1'] == col1A) & (df['col2'] == col1B)].score).to_frame()['score'].iloc[0]
df = df.append({'col1' : col1B , 'col2' : col1A ,'score' : score}, ignore_index=True)
df = drop_all_revd_in_df(df)
return df
# initialize list of lists
dataShort = [["Andy", "Claude", 15],
["Vincent", "Frida", 12], # NOT OK
["Vincent", "Pablo", 11]] # NOT OK
# Create the pandas DataFrame
df = pd.DataFrame(dataShort, columns = ['col1', 'col2', 'score'])
print(df)
col1GTcol2CleanUp(df).sort_values(['col1','col2']).reset_index(drop='True')Output:
col1 col2 score
0 Andy Claude 15
1 Vincent Frida 12
2 Vincent Pablo 11
Out[1]:
col1 col2 score
0 Andy Claude 15
1 Frida Vincent 12
2 Pablo Vincent 11发布于 2020-02-19 17:16:55
跟进instinct246的答案:熊猫.apply()方法往往很慢。这里还有另一种方法(下面代码中的method1),它的速度要快一些,但是需要临时存储额外的列。这也可以写在一行(下面代码中的method2)上,甚至更快,但是需要两列的临时存储。
np.random.seed(113)
df = pd.DataFrame({'col1':np.random.randint(low=0, high=9, size=10000, dtype='int32'),
'col2':np.random.randint(low=0, high=9, size=10000, dtype='int32')})
def method1(df):
""" Modifies df in place, no return value """
df['col_tmp'] = df[['col1','col2']].max(axis=1)
df['col1'] = df[['col1','col2']].min(axis=1)
df['col2'] = df['col_tmp']
del df['col_tmp']
def method2(df):
""" Modifies df in place, no return value """
df['col1'], df['col2'] = df[['col1','col2']].min(axis=1), df[['col1','col2']].max(axis=1)
# instinct246's answer
def check_row(row):
if row['col1'] > row['col2']:
row['col1'],row['col2'] = row['col2'],row['col1']
return row
else:
return row
def method3(df):
""" Returns modified df """
return df.apply(check_row, axis =1)运行%%timeit返回:
method1: 1.92 ms ± 62.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
method2: 1.31 ms ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
method3: 558 ms ± 8.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)https://stackoverflow.com/questions/60274195
复制相似问题