首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将特定列设置为带有熊猫的int类型

如何将特定列设置为带有熊猫的int类型
EN

Stack Overflow用户
提问于 2017-09-12 13:23:07
回答 1查看 270关注 0票数 2

我有一个脚本,用于从文件夹中将一些csv文件写入excel:

代码语言:javascript
复制
from pandas.io.excel import ExcelWriter
import pandas
import os

path = 'data/'
ordered_list = sorted(os.listdir(path), key = lambda x: int(x.split(".")[0]))


with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        pandas.read_csv(path + csv_file).to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

现在我的问题是,所有的列(比方说G:H)都是字符串格式(ex '400或'10)和a‘以前,我认为它们是以字符串的形式出现的,因为csv将它们转换成字符串,我需要它们是int,我如何使G: think?!我用python 3,谢谢!

PS(这是csv样本):

代码语言:javascript
复制
ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8
1,CAZACU MIHAI,133121140,Aug 2016,84,84
2,NICOARA PETRU,143152638,"Aug 2014, Sept 2014",126,84
3,CERNEA NICOLAE DAN,143354723,Dec 2015,84,84
4,LUDWIG PETRU,144091376,Nov 2014,42,42
5,POPA REMUS,1440915363,Iun 2015,84,84
6,BOGDAN MARCEL,144154726,"Feb 2015, Apr 2015, Sept 2015, Oct 2015, Feb 2016",336,336
7,HENDRE AUGUSTIN,145054704,Feb 2015,42,42
8,COJOC VASILE,147050307,"Sept 2014, Oct 2014",84,84
9,RADULESCU VICTOR,147352628,"Sept 2014, Oct 2014, Nov 2014, Dec 2014",168,168
10,RADAU DUMITRU,148054764,"Feb 2017, Mar 2017",168,168
11,COVACIU PETRU,148054802,Iun 2016,84,84
12,BOT IOAN,14808634,"Aug 2014, Sept 2014, Oct 2014, Nov 2014",168,168

这个头是这个:

代码语言:javascript
复制
ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-09-12 13:25:03

您可以读取每个文件两次--首先只使用参数nrows读取头,然后用skiprows读取正文。

那就需要写两遍了。

解决方案有点复杂,因为熊猫错误地解析数据--不支持MulttiIndex的8级别。如果没有设置标头,则来自标头的数据将与主体连接,而输出则会变得一团糟。

代码语言:javascript
复制
with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df1 = pandas.read_csv(path + csv_file, nrows=8, header=None)
        df2 = pandas.read_csv(path + csv_file, skiprows=8, header=None)
        df1.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', header=False)
        row = len(df1.index)
        df2.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', startrow=row , startcol=0, header=False)

使用apply通过strip删除',通过astype转换为int

代码语言:javascript
复制
cols = ['G','H']

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df = pandas.read_csv(path + csv_file)
        df[cols] = df[cols].astype(str).apply(lambda x: x.str.strip("'")).astype(int)
        print (df.head())
        df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

另一种解决方案是使用带有自定义函数的参数converters

代码语言:javascript
复制
cols = ['G','H']

def converter(x):
    return int(x.strip("'"))
#define each column
converters={x:converter for x in cols}

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df = pandas.read_csv(path + csv_file, converters=converters)
        print (df.head())
        df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')
票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46177648

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档