Setting target='cuda' vs target='cpu'

Hello Everyone,

I have recently started using CUDA for accelerating machine learning applications. I use a Windows Server 2012 with NVIDIA Quadro P4000 and Cuda Toolkit 8.0.61 (8.0.61.2 patch applied). Also the system has two intel E5-2640 v4 CPUs (totally 20 cores). To test the performance of the CUDA environment with Python, I wrote the sample program given on https://developer.nvidia.com/how-to-cuda-python and set the target to ‘cpu’ once and ‘cuda’ the other time to observe the difference in computing times. I was surprised to see that the computing time for target =‘cpu’ was less than the computing time for the target=‘cuda’.

target ‘cpu’ = 0.074184 seconds
target ‘cuda’ = 2.506713 seconds

Also I ran the program without using the @vectorize decorator and obtained the computing time as 11.145553 seconds. Thus there is significant improvement in computing time when using invoking CUDA in python. But I am not able to wrap my head around the fact that using only the cpu is yielding better performance as opposed to using cpu along with the gpu. Could anyone please explain this to me?

Thanks for the time and response.

“Your first CUDA python program” isn’t designed to be a demonstration of something that is clearly faster. It’s a demonstration of basic principles. The 2.5 sec you are measuring for CUDA is mostly startup overhead anyway. If you ran another vector add right after the first one, timed separately, it would be a lot quicker than the first one. (Not saying it would be faster than CPU - I don’t know.)

Do a large matrix multiply using cublas from python. It will be noticeably quicker than doing it on the CPU.