Introduction to cuda with numba - first assessment

Hi, I can not understand why this code is not fast enough.

–[CODE]-------------------------------------------------------

Remember that we can’t use numpy math function on the GPU…

from numba import cuda
import math

Consider modifying the 3 values in this cell to optimize host <-> device memory movement

transfer inputs to the gpu

greyscales_gpu = cuda.to_device(greyscales)
weights_gpu = cuda.to_device(weights)

normalized_gpu = cuda.device_array(shape=(n,),
dtype=np.float32)
weighted_gpu = cuda.device_array(shape=(n,),
dtype=np.float32)
activated_gpu = cuda.device_array(shape=(n,),
dtype=np.float32)

Modify these 3 function calls to run on the GPU

@vectorize([‘float32(float32)’],target=‘cuda’)
def normalize_gpu(grayscales):
return grayscales / 255
@vectorize([‘float32(float32, float32)’],target=‘cuda’)
def weigh_gpu(values, weights):
return values * weights
@vectorize([‘float32(float32)’],target=‘cuda’)
def activate_gpu(values):
return ( math.exp(values) - math.exp(-values) ) / ( math.exp(values) + math.exp(-values) )

Feel free to modify the 3 function calls in this cell

normalize_gpu(greyscales_gpu, out=normalized_gpu)
weigh_gpu(normalized_gpu, weights_gpu, out=weighted_gpu)
activate_gpu(weighted_gpu, out=activated_gpu)
SOLUTION = activated_gpu

when i run this code on jupyter.(%%timeit)
613 µs ± 520 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

but the asessment result said…

Your code produced the correct output. +50 pts
Your code is not fast enough. +0 pts
You did not pass, please try again.
Score: 50/100