我在csv文件中有一个大数据:
中的输出。

发布于 2021-05-08 07:16:06
首先打开csv获得第一列。
import pandas as pd
df = pd.read_csv("filename.csv")我将只使用io来模拟文件中的数据。
text = """first
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))结果
first
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10接下来,您可以使用shift创建其他列
df['second'] = df['first'].shift(-1)
df['third'] = df['first'].shift(-2)结果
first second third
0 1 2.0 3.0
1 2 3.0 4.0
2 3 4.0 5.0
3 4 5.0 6.0
4 5 6.0 7.0
5 6 7.0 8.0
6 7 8.0 9.0
7 8 9.0 10.0
8 9 10.0 NaN
9 10 NaN NaN最后,您可以使用NaN删除最后两行,并将所有行转换为整数。
df = df[:-2].astype(int)或者你在其他地方没有NaN
df = df.dropna().astype(int)结果:
first second third
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10最小工作码
text = """first
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['first'])
print(df)
df['second'] = df['first'].shift(-1) #, fill_value=0)
df['third'] = df['first'].shift(-2)
print(df)
#df = df.dropna().astype(int)
df = df[:-2].astype(int)
print(df)编辑:
使用for-loop创建任意数量的列。
text = """col 1
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['col 1'])
print(df)
number = 5
for x in range(1, number+1):
df[f'col {x+1}'] = df['col 1'].shift(-x)
print(df)
#df = df.dropna().astype(int)
df = df[:-number].astype(int)
print(df)结果
col 1
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
col 1 col 2 col 3 col 4 col 5 col 6
0 1 2.0 3.0 4.0 5.0 6.0
1 2 3.0 4.0 5.0 6.0 7.0
2 3 4.0 5.0 6.0 7.0 8.0
3 4 5.0 6.0 7.0 8.0 9.0
4 5 6.0 7.0 8.0 9.0 10.0
5 6 7.0 8.0 9.0 10.0 NaN
6 7 8.0 9.0 10.0 NaN NaN
7 8 9.0 10.0 NaN NaN NaN
8 9 10.0 NaN NaN NaN NaN
9 10 NaN NaN NaN NaN NaN
col 1 col 2 col 3 col 4 col 5 col 6
0 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10https://stackoverflow.com/questions/67444424
复制相似问题