Hi, I can not understand why this code is not fast enough.
–[CODE]-------------------------------------------------------
Remember that we can’t use numpy math function on the GPU…
from numba import cuda
import math
Consider modifying the 3 values in this cell to optimize host <-> device memory movement
transfer inputs to the gpu
greyscales_gpu = cuda.to_device(greyscales)
weights_gpu = cuda.to_device(weights)
normalized_gpu = cuda.device_array(shape=(n,),
dtype=np.float32)
weighted_gpu = cuda.device_array(shape=(n,),
dtype=np.float32)
activated_gpu = cuda.device_array(shape=(n,),
dtype=np.float32)
Modify these 3 function calls to run on the GPU
@vectorize([‘float32(float32)’],target=‘cuda’)
def normalize_gpu(grayscales):
return grayscales / 255
@vectorize([‘float32(float32, float32)’],target=‘cuda’)
def weigh_gpu(values, weights):
return values * weights
@vectorize([‘float32(float32)’],target=‘cuda’)
def activate_gpu(values):
return ( math.exp(values) - math.exp(-values) ) / ( math.exp(values) + math.exp(-values) )
Feel free to modify the 3 function calls in this cell
normalize_gpu(greyscales_gpu, out=normalized_gpu)
weigh_gpu(normalized_gpu, weights_gpu, out=weighted_gpu)
activate_gpu(weighted_gpu, out=activated_gpu)
SOLUTION = activated_gpu
when i run this code on jupyter.(%%timeit)
613 µs ± 520 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
but the asessment result said…
Your code produced the correct output. +50 pts
Your code is not fast enough. +0 pts
You did not pass, please try again.
Score: 50/100