将具有共同前缀的文件进行分组和合并

2023-03-19 16:12| 来源: 网络整理| 查看: 265

我做了几个函数，帮助我下载所有按选区划分的选举的csv。下载的文件名是这样的。

Hzwpukgh_2008Parliamentary-Majoritarian Hzwpukgh_2008Parliamentary-PartyList Hzwpukgh_2008Presidential ... Truc_2008Presidential

对于一个特定的选举和一个特定的选区，它给了我以下信息。

我想把某一地区不同年份的csv收集起来，比如说Hzwpukgh，做成一个csv，看起来是这样的 :

2010 Presidential 2017 Presidential ... Tprolps Zhhrhzocpsp 67.68 NaN Levan Gachechiladze 20.96 NaN ... Npvynp Thynclshzocpsp NaN 64.15 Davit Bakradze NaN 13.86 ...

但是，第一步，我想把csvs合并成一个。所以如何合并下划线前有相同名字的文件？

It would look like :

"Election"," Map Level"," Precinct ID"," Precinct Name","Overall Results","#1 - Mikheil Saakashvili","#2 - Levan Gachechiladze","#3 - Shalva Natelashvili","#4 - Arkadi (Badri) Patarkatsishvili","#5 - Davit Gamkrelidze","#6 - Giorgi (Gia) Maisashvili","#7 - Irina Sarishvili-Chanturia","Total Voter Turnout (#)","Total Voter Turnout (%)","Average votes per minute (08:00-12:00)","Average votes per minute (12:00-17:00)","Average votes per minute (17:00-20:00)" "2008 Presidential","Precinct","1","39-1","Mikheil Saakashvili","74.48","18.45","1.74","5.92","3.71","0.58","0.12","862","58.24","1.19","1.45","1.05" "2008 Presidential","Precinct","10","39-10","Mikheil Saakashvili","61.62","24.75","3.03","5.56","5.05","0","0","198","75","0.25","0.34","0.2" ... "2008 Parliamentary-Majoritarian","Precinct","1","39-1","Mikheil Saakashvili","74.48","18.45","1.74","5.92","3.71","0.58","0.12","862","58.24","1.19","1.45","1.05" "2008 Parliamentary-Majoritarian","Precinct","10","39-10","Mikheil Saakashvili","61.62","24.75","3.03","5.56","5.05","0","0","198","75","0.25","0.34","0.2"

然后我就可以创建上面显示的数据框架。如果你有任何其他的方法，我将非常高兴听到它们:)

My attempt

I tried the following :

import glob import random import os import pandas def find_filesets(path="."): csv_files = {} for name in glob.glob("{}/*_*.csv".format(path)): # there's almost certainly a better way to do this key = os.path.splitext(os.path.basename(name))[0].split('_')[0] csv_files.setdefault(key, []).append(name) for key,filelist in csv_files.items(): print(key, filelist) # do something with filelist create_merged_csv(key, filelist) def create_merged_csv(key, filelist): with open('{}-aggregate.csv'.format(key), 'w+b') as outfile: for filename in filelist: df = pandas.read_csv(filename) print(df) df.to_csv(outfile, index=False) find_filesets('./Results')

But it returned :

01 ['./Results\\01_2016Parliamentary-Majoritarian.csv', './Results\\01_2016Parliamentary-MajoritarianRunoff.csv', './Results\\01_2016Parliamentary-PartyList.csv'] "Election"," Map Level"," Precinct ID"," Precinct Name","Overall Results","#1 - Initiative Group","#2 - United National Movement","#3 - Free Democrats","#4 - Alliance of Patriots","#5 - Democratic Movement","#6 - Republican party","#7 - Georgia for Peace","#8 - State for the People","#9 - Georgian Idea","#10 - National Forum","#11 - For United Georgia","#12 - Georgia","#13 - Ours - People's Party","#14 - Progressive Democratic Movement","#14 - Georgian Group","#14 - Labour","#14 - Communist Party - Stalin","#14 - Socialist Workers Party","#14 - United Communist Party","#14 - Industrialists - Our Homeland","#14 - Merab Kostava Society","#14 - Leftist Alliance","#14 - In the Name of the Lord","#14 - Georgian Dream","Invalid Ballots (%)","More Ballots Than Votes (#)","More Votes Than Ballots (#)","Total Voter Turnout (#)","Total Voter Turnout (%)","Average votes per minute (08:00-12:00)","Average votes per minute (12:00-17:00)","Average votes per minute (17:00-20:00)" 0 "2016 Parliamentary - Majoritarian","Precinct"... 1 "2016 Parliamentary - Majoritarian","Precinct"... 2 "2016 Parliamentary - Majoritarian","Precinct"... 3 "2016 Parliamentary - Majoritarian","Precinct"... ... C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:22: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. ------------------------ TypeError Traceback (most recent call last) in 4 import pandas 5 ----> 6 find_filesets('./Results') in find_filesets(path) 9 print(key, filelist) 10 # do something with filelist ---> 11 create_merged_csv(key, filelist) in create_merged_csv(key, filelist) 22 df = pandas.read_csv(filename, sep='delimiter') 23 print(df) ---> 24 df.to_csv(outfile, index=False, header=None) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal) 3018 doublequote=doublequote, 3019 escapechar=escapechar, decimal=decimal) -> 3020 formatter.save() C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py in save(self) 170 self.writer = UnicodeWriter(f, **writer_kwargs) 171 --> 172 self._save() C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py in _save(self) 286 break 287 --> 288 self._save_chunk(start_i, end_i) C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\formats\csvs.py in _save_chunk(self, start_i, end_i) 313 314 libwriters.write_csv_rows(self.data, ix, self.nlevels, --> 315 self.cols, self.writer) pandas/_libs/writers.pyx in pandas._libs.writers.write_csv_rows() TypeError: a bytes-like object is required, not 'str'

【本文地址】

公司简介

联系我们