1 前言
最近在做毕设,数据集使用的天池竞赛的,csv数据格式的,下面整理下csv读取的几种方法。
主要参考 :Python读取csv文件的几种方法
下面是测试文件test.csv
的内容(与下面的python代码同目录):
1 2 3 4 5 6 file_id,label,api,tid,index 1,5,LdrLoadDll,2488,0 1,5,LdrGetProcedureAddress,2488,1 1,5,LdrGetProcedureAddress,2488,2 1,5,LdrGetProcedureAddress,2488,3 1,5,LdrGetProcedureAddress,2488,4
2 读取操作
2.1 利用Python I/O读取文件
2.1.1 csv.reader()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import csvdef read_csv (csv_path) : data = [] with open(csv_path) as f: csv_reader = csv.reader(f) header = next(csv_reader) print(header) for row in csv_reader: data.append(row[1 ]) print(data) f.close() if __name__ == "__main__" : csv_path = ".\\test.csv" read_csv(csv_path=csv_path)
2.1.2 csv.DictReader()
可以获取字段名,参考 csv.DictReader 读取字段名(headers)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import csvdef read_csv (csv_path) : with open(csv_path, 'r' ) as f: reader = csv.DictReader(f) headers = reader.fieldnames print(headers) column = [row['index' ] for row in reader] print(column) f.close() if __name__ == "__main__" : csv_path = ".\\test.csv" read_csv(csv_path=csv_path)
2.1.3 open()
把csv文件当作文本文件,使用open()
和split
进行处理,但是存在明显缺点——如果字段中含有分隔符,
,分割将会比较麻烦 。
1 2 3 4 5 6 7 8 9 10 11 12 13 def read_csv (csv_path) : with open(csv_path) as f: column = [] column_id = 0 for line in f.readlines(): column.append(line.rstrip("\n" ).split(',' )[column_id]) print(column) f.close() if __name__ == "__main__" : csv_path = ".\\test.csv" read_csv(csv_path=csv_path)
2.2 利用numpy读取
缺点:csv里面的值只能是数值型的,字符串类型会报错,读取的值是按照float存储的
解释:delimiter
是分隔符,skiprows
是跳过前n
行,usecols
是使用的列数,例子中读取的是3,4
列。
1 2 3 4 5 6 7 8 9 10 11 12 import numpy as npdef read_csv (csv_path) : n = 1 data = np.loadtxt(open(csv_path, "rb" ), delimiter="," , skiprows=n, usecols=[2 , 3 ]) print(data) if __name__ == "__main__" : csv_path = ".\\test.csv" read_csv(csv_path=csv_path)
2.3 利用pandas读取
1 2 3 4 5 6 7 8 9 10 11 12 13 14 import pandas as pddef read_csv (csv_path) : data = pd.read_csv(csv_path, sep=',' , header='infer' , usecols=[2 ]) print(data) array = data.values[0 ::, 0 ::] print(array) print(array[0 ][0 ]) if __name__ == "__main__" : csv_path = ".\\test.csv" read_csv(csv_path=csv_path)
X 参考
Life is painting a picture, not doing a sum.