I have recently started using CUDA for accelerating machine learning applications. I use a Windows Server 2012 with NVIDIA Quadro P4000 and Cuda Toolkit 8.0.61 (22.214.171.124 patch applied). Also the system has two intel E5-2640 v4 CPUs (totally 20 cores). To test the performance of the CUDA environment with Python, I wrote the sample program given on https://developer.nvidia.com/how-to-cuda-python and set the target to ‘cpu’ once and ‘cuda’ the other time to observe the difference in computing times. I was surprised to see that the computing time for target =‘cpu’ was less than the computing time for the target=‘cuda’.
target ‘cpu’ = 0.074184 seconds
target ‘cuda’ = 2.506713 seconds
Also I ran the program without using the @vectorize decorator and obtained the computing time as 11.145553 seconds. Thus there is significant improvement in computing time when using invoking CUDA in python. But I am not able to wrap my head around the fact that using only the cpu is yielding better performance as opposed to using cpu along with the gpu. Could anyone please explain this to me?
Thanks for the time and response.