文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将特定列设置为带有熊猫的int类型

问如何将特定列设置为带有熊猫的int类型
EN

Stack Overflow用户

提问于 2017-09-12 13:23:07

回答 1查看 270关注 0票数 2

我有一个脚本，用于从文件夹中将一些csv文件写入excel：

from pandas.io.excel import ExcelWriter
import pandas
import os

path = 'data/'
ordered_list = sorted(os.listdir(path), key = lambda x: int(x.split(".")[0]))


with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        pandas.read_csv(path + csv_file).to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

现在我的问题是，所有的列(比方说G:H)都是字符串格式(ex '400或'10)和a‘以前，我认为它们是以字符串的形式出现的，因为csv将它们转换成字符串，我需要它们是int，我如何使G: think？！我用python 3，谢谢！

PS(这是csv样本)：

ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8
1,CAZACU MIHAI,133121140,Aug 2016,84,84
2,NICOARA PETRU,143152638,"Aug 2014, Sept 2014",126,84
3,CERNEA NICOLAE DAN,143354723,Dec 2015,84,84
4,LUDWIG PETRU,144091376,Nov 2014,42,42
5,POPA REMUS,1440915363,Iun 2015,84,84
6,BOGDAN MARCEL,144154726,"Feb 2015, Apr 2015, Sept 2015, Oct 2015, Feb 2016",336,336
7,HENDRE AUGUSTIN,145054704,Feb 2015,42,42
8,COJOC VASILE,147050307,"Sept 2014, Oct 2014",84,84
9,RADULESCU VICTOR,147352628,"Sept 2014, Oct 2014, Nov 2014, Dec 2014",168,168
10,RADAU DUMITRU,148054764,"Feb 2017, Mar 2017",168,168
11,COVACIU PETRU,148054802,Iun 2016,84,84
12,BOT IOAN,14808634,"Aug 2014, Sept 2014, Oct 2014, Nov 2014",168,168

这个头是这个：

ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8

excel

python-3.x

pandas

csv

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-09-12 13:25:03

您可以读取每个文件两次--首先只使用参数nrows读取头，然后用skiprows读取正文。

那就需要写两遍了。

解决方案有点复杂，因为熊猫错误地解析数据--不支持MulttiIndex的8级别。如果没有设置标头，则来自标头的数据将与主体连接，而输出则会变得一团糟。

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df1 = pandas.read_csv(path + csv_file, nrows=8, header=None)
        df2 = pandas.read_csv(path + csv_file, skiprows=8, header=None)
        df1.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', header=False)
        row = len(df1.index)
        df2.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', startrow=row , startcol=0, header=False)

使用apply通过strip删除'，通过astype转换为int

cols = ['G','H']

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df = pandas.read_csv(path + csv_file)
        df[cols] = df[cols].astype(str).apply(lambda x: x.str.strip("'")).astype(int)
        print (df.head())
        df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

另一种解决方案是使用带有自定义函数的参数converters：

cols = ['G','H']

def converter(x):
    return int(x.strip("'"))
#define each column
converters={x:converter for x in cols}

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df = pandas.read_csv(path + csv_file, converters=converters)
        print (df.head())
        df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46177648

复制

相似问题

问如何将特定列设置为带有熊猫的int类型
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将特定列设置为带有熊猫的int类型EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将特定列设置为带有熊猫的int类型
EN