CPU is 100% busy during kernel execution


I have an application where I am initially loading all the data to the CPU and then executing a kernel in a tight loop, with cudaThreadSynchronize() in between. During the kernel execution, the CPU status (on System Monitor tool on Linux as well as top) is shown as 100% busy. If kernel execution completion is being polled, shouldn’t the CPU show close to 0% usage? There are no other applications running on the machine at that time.

How can I free up the CPU if I want to run other applications parallely?

Thanks in advance for any advice.


“Polled” means “the CPU is actively checking a memory location to determine if something else has completed,” which means 100% CPU usage. Use cudaSetContextFlags to enable blocking synchronization (which will allow the driver to wake up the GPU at synchronization points) if you want to sacrifice latency for CPU time.