使用Python的multiprocessing模块和pandas库实现多进程读取和处理多个文件,具体步骤如下:
import multiprocessing as mp
import pandas as pd
def process_file(filename, process_type):
df = pd.read_csv(filename)
# 根据具体的处理方式提取数据
if process_type == 'type1':
processed_data = df['column1']
elif process_type == 'type2':
processed_data = df['column2']
return processed_data
def parallel_process_files(file_list, process_type):
pool = mp.Pool(mp.cpu_count())
results = pool.map(process_file, [(filename, process_type) for filename in file_list])
pool.close()
pool.join()
return results
file_list = ['file1.csv', 'file2.csv', 'file3.csv']
processed_data_list = parallel_process_files(file_list, 'type1')
这样就可以并行读取多个文件并提取数据,提高处理效率。
上一篇:并行多线程PHP
下一篇:并行读取和处理多个文件