我有大量的组合(86个选择10个,产生3.5万亿个结果),我写了一个算法,每秒可以处理50万个组合。我不想等待81天才能看到最终结果,所以我自然倾向于将其分为许多流程,由我的许多核心处理。
考虑一下这种天真的方法:
import itertools from concurrent.futures import ProcessPoolExecutor def algorithm(combination): # returns a boolean in roughly 1/500000th of a second on average def process(combinations): for combination in combinations: if algorithm(combination): # will be very rare (a few hundred times out of trillions) if that matters print("Found matching combination!", combination) combination_generator = itertools.combinations(eighty_six_elements, 10) # My system will have 64 cores and 128 GiB of memory with ProcessPoolExecutor(workers=63) as executor: # assign 1,000,000 combinations to each process group = [] for combination in combination_generator: group.append(combination) if len(group) >= 1_000_000: executor.submit(process, group) group = []
这段代码“有效”,但与单线程方法相比,它几乎没有任何性能提升,因为它受到组合生成的瓶颈。
如何将此计算传递给子进程,以便可以并行化?每个进程如何生成itertools.combinations
的特定子集?
我找到了这个答案,但它只涉及生成单个指定元素,而我需要高效地生成数百万个指定元素。