我有一个脚本,用于从文件夹中将一些csv文件写入excel:
from pandas.io.excel import ExcelWriter
import pandas
import os
path = 'data/'
ordered_list = sorted(os.listdir(path), key = lambda x: int(x.split(".")[0]))
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in ordered_list:
pandas.read_csv(path + csv_file).to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')现在我的问题是,所有的列(比方说G:H)都是字符串格式(ex '400或'10)和a‘以前,我认为它们是以字符串的形式出现的,因为csv将它们转换成字符串,我需要它们是int,我如何使G: think?!我用python 3,谢谢!
PS(这是csv样本):
ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8
1,CAZACU MIHAI,133121140,Aug 2016,84,84
2,NICOARA PETRU,143152638,"Aug 2014, Sept 2014",126,84
3,CERNEA NICOLAE DAN,143354723,Dec 2015,84,84
4,LUDWIG PETRU,144091376,Nov 2014,42,42
5,POPA REMUS,1440915363,Iun 2015,84,84
6,BOGDAN MARCEL,144154726,"Feb 2015, Apr 2015, Sept 2015, Oct 2015, Feb 2016",336,336
7,HENDRE AUGUSTIN,145054704,Feb 2015,42,42
8,COJOC VASILE,147050307,"Sept 2014, Oct 2014",84,84
9,RADULESCU VICTOR,147352628,"Sept 2014, Oct 2014, Nov 2014, Dec 2014",168,168
10,RADAU DUMITRU,148054764,"Feb 2017, Mar 2017",168,168
11,COVACIU PETRU,148054802,Iun 2016,84,84
12,BOT IOAN,14808634,"Aug 2014, Sept 2014, Oct 2014, Nov 2014",168,168这个头是这个:
ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8发布于 2017-09-12 13:25:03
您可以读取每个文件两次--首先只使用参数nrows读取头,然后用skiprows读取正文。
那就需要写两遍了。
解决方案有点复杂,因为熊猫错误地解析数据--不支持MulttiIndex的8级别。如果没有设置标头,则来自标头的数据将与主体连接,而输出则会变得一团糟。
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in ordered_list:
df1 = pandas.read_csv(path + csv_file, nrows=8, header=None)
df2 = pandas.read_csv(path + csv_file, skiprows=8, header=None)
df1.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', header=False)
row = len(df1.index)
df2.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', startrow=row , startcol=0, header=False)使用apply通过strip删除',通过astype转换为int
cols = ['G','H']
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in ordered_list:
df = pandas.read_csv(path + csv_file)
df[cols] = df[cols].astype(str).apply(lambda x: x.str.strip("'")).astype(int)
print (df.head())
df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')另一种解决方案是使用带有自定义函数的参数converters:
cols = ['G','H']
def converter(x):
return int(x.strip("'"))
#define each column
converters={x:converter for x in cols}
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in ordered_list:
df = pandas.read_csv(path + csv_file, converters=converters)
print (df.head())
df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')https://stackoverflow.com/questions/46177648
复制相似问题