Edit: I found some answers eventually, but I can’t delete this topic.
I noticed that what should be pure-GPU code, such as matrixMul from CUDA samples, appears to have 100% CPU usage.
To be more specific: I’m using CUDA 6.5 on Ubuntu 14.04 for AMD64. I changed nIter in matrixMul to a large number so that the process loops indefinitely in the GPU code, and set the matrix sizes to be 4096. Running “top” shows 99% to 100% CPU usage by the process. I’m seeing this both with Tesla and GTX, and with both matrixMul and matrixMulCUBLAS.
Is this just a CPU usage measurement artifact or is a CPU core really being used up by pure-GPU code, and if the latter, then why?