我有大量的组合(86个选择10个,产生3.5万亿个结果),我写了一个算法,每秒可以处理50万个组合。我不想等待81天才能看到最终结果,所以我自然倾向于将其分为许多流程,由我的许多核心处理。
考虑一下这种天真的方法:
import itertools
from concurrent.futures import ProcessPoolExecutor
def algorithm(combination):
# returns a boolean in roughly 1/500000th of a second on average
def process(combinations):
for combination in combinations:
if algorithm(combination):
# will be very rare (a few hundred times out of trillions) if that matters
print("Found matching combination!", combination)
combination_generator = itertools.combinations(eighty_six_elements, 10)
# My system will have 64 cores and 128 GiB of memory
with ProcessPoolExecutor(workers=63) as executor:
# assign 1,000,000 combinations to each process
group = []
for combination in combination_generator:
group.append(combination)
if len(group) >= 1_000_000:
executor.submit(process, group)
group = []
这段代码“有效”,但与单线程方法相比,它几乎没有任何性能提升,因为它受到组合生成的瓶颈。
如何将此计算传递给子进程,以便可以并行化?每个进程如何生成itertools.combinations的特定子集?
我找到了这个答案,但它只涉及生成单个指定元素,而我需要高效地生成数百万个指定元素。