Hello all.

I’m starting to look into parallel processing with CUDA. Been doing most of my programming in 3D modelling software up to this point, but really want access to the CUDA library to speed things up. Nevertheless, I’ve installed it and I am getting the unepected result that the example file is running faster without the parallel. For example, the code attached is returning:

VectorAdd took for 0.7805259227752686econds

VectorAdd took for 1.8527252674102783econds

I know this is probably a fairly dumb question, but any help would be appreciated. The GPU is a Quadro P4000.

```
import numpy as np
import time
from numba import vectorize, cuda
def VectorAddCPU(a, b):
return a + b
@vectorize(['float32(float32, float32)'], target='cuda')
def VectorAddGPU(a, b):
return a + b
def main():
N = 320000000
A = np.ones(N, dtype=np.float32)
B = np.ones(N, dtype=np.float32)
start = time.time()
C = VectorAddCPU(A, B)
vector_add_time = time.time() - start
print("VectorAdd took for % seconds" % vector_add_time)
start = time.time()
C = VectorAddGPU(A, B)
vector_add_time = time.time() - start
print("VectorAdd took for % seconds" % vector_add_time)
if __name__=='__main__':
main()
```