几周来,我一直在学习python作为我的第一种编程语言。我决定用Numba编写一个乐透模拟。该代码在我的CPU上每秒大约250k次迭代时运行得很好。我真的很想看看它是如何在我的英伟达GPU上运行的,但我有点力不从心。如果有人能帮我一把,我将非常感激。
我的代码:
"""Importing the time module.""" import time from numba import jit # Importing the jit function from the numba module. import numpy as np # Importing the numpy library and giving it the alias np. start = time.time() @jit(nopython=True) def lotto_numbers(): """ The lotto_numbers function generates a list of six random numbers between 1 and 59. The function will not return any number that is zero, but it will return any other number. :return: A sorted numpy array of 6 unique random numbers between 1 and 59 """ generated = False while not generated: balls = np.random.choice(59, 6, replace=False) balls.sort() temp = np.where(balls == 0) (check_for_zero,) = temp[0].shape if check_for_zero == 0: generated = True return balls your_numbers = lotto_numbers() # optionally choose 6 numbers (not yet implemented). print(f"Your numbers are: {your_numbers}") MATCH = False GAME_COUNT = 0 while not MATCH: tmp = lotto_numbers() cmp = tmp == your_numbers GAME_COUNT += 1 if cmp.all(): MATCH = True TOTAL_COST = GAME_COUNT * 2 TOTAL_WEEKS = GAME_COUNT / 2 TOTAL_YEARS = round(TOTAL_WEEKS / 52, 1) end = time.time() TOTAL_TIME = end - start GPS = GAME_COUNT / TOTAL_TIME print(f"\n\nWINNER!!! after {GAME_COUNT:,} games") print(f"\ttotal cost: £{TOTAL_COST:,}") print(f"\tweeks until win: {TOTAL_WEEKS:,} weeks") print(f"\tyears until win: {TOTAL_YEARS:,} years") print( f"\texecution time: {TOTAL_TIME:,.2f} seconds at {GPS:,.2f} games per second") if GAME_COUNT % 10000 == 0: print(f"Game : {GAME_COUNT:,}", end='\r')
我知道我需要编写一个CUDA内核来针对GPU。我想我应该能够运行float16,因为数字并不复杂。此外,@vectorize似乎很重要。但是,老实说,我在踩水。