Why is CPU usage at 100% in pure-GPU code?

Edit: I found some answers eventually, but I can’t delete this topic.

I noticed that what should be pure-GPU code, such as matrixMul from CUDA samples, appears to have 100% CPU usage.

To be more specific: I’m using CUDA 6.5 on Ubuntu 14.04 for AMD64. I changed nIter in matrixMul to a large number so that the process loops indefinitely in the GPU code, and set the matrix sizes to be 4096. Running “top” shows 99% to 100% CPU usage by the process. I’m seeing this both with Tesla and GTX, and with both matrixMul and matrixMulCUBLAS.

Is this just a CPU usage measurement artifact or is a CPU core really being used up by pure-GPU code, and if the latter, then why?

Things like cudaDeviceSynchronize() generally peg the CPU usage at 100%.

Use cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync) is you want to change the default behaviour.