100% CPU use while waiting for kernel

MarcDitto · June 28, 2008, 10:01am

After I call a kernel, I want the CPU to wait until it’s finished, then copy the results back to the host.

If I do something like:

   myKernel<<<grid, block, block.x*4*sizeof(int)>>>(dCurrentN);    

    CUT_CHECK_ERROR("Kernel execution failed");    

    CUDA_SAFE_CALL( cudaMemcpy( hCurrentN, dCurrentN, MemSize, cudaMemcpyDeviceToHost) );

The Memcpy will wait for the kernel to finish, as expected.

Everything runs fine, but the CPU spins using 100% while waiting.

I read this is to reduce latency… I can’t find that reference anymore, it may have been on the forum and not the docs.

I want the CPU to sleep while waiting, not poll.

I thought the solution was to call cudaThreadSynchronize() after the kernel.

If I add that, my program still runs fine but still uses 100% CPU.

Currently I can hack an ugly short-term workaround to the pegged CPU by manually putting a Windows Sleep() call after the kernel… but that’s not what I want since that’s a hardwired time delay, bad for many reasons.

This is on an old G80 board as well as a new GX280, both using the latest CUDA 2.0 beta SDK on Windows XP.

How can I avoid the 100% CPU while waiting for a kernel to finish?

MisterAnderson42 · June 28, 2008, 1:30pm

Short answer:
cudaMemcpy has an implicit cudaThreadSynchronize in it. cudaThreadSynchronize spin waits for the kernel to finish to give you the absolute minimum possible latency between kernel finishing and your CPU code continuing.

With CUDA 1.1 and newer, you can set an event after calling the kernel and then start a loop checking the status of cudaEventQuery. Inside the loop, put a nanosleep (I don’t know the equivalent windows function) or whatever you like. Latency will be greatly increased but you should no longer have 100% CPU usage.

Edit: The long answer is somewhere on the forums posted just after CUDA 1.1 was released. cudaEventQuery should be a good keyword to search for.

MarcDitto · July 9, 2008, 7:46pm

Thanks for the reply. The polling method with The Sleep works, but in Windows that has a 1ms minimum. That’s not great, it means my 2ms kernel often gets overslept and I lose throughput. Or, i can use EventSynchronize, and the CUDA drivers poll with 100%, and now my CPU heat and power is maxed out all day.

Feature request to nvidia: WaitForEvent(). Sleep the CPU until the event triggers. It may have more latency than the current Syncronize() commands, but hopefully better resolution than a manual Sleep() poll loop.

What’s the shortest nano-sleep in Linux? Windows’s 1ms is way too coarse.

BarsMonster · July 9, 2008, 9:40pm

http://www.mkssoftware.com/docs/man3/nanosleep.3.asp

seibert · July 10, 2008, 2:06pm

In Linux 2.6, it looks like 1 ms is also the limit for most systems. This blog post mentions that shorter intervals can be obtained by using the real-time clock device:

http://pcovington.blogspot.com/2005/11/som…time-under.html

It’s not portable like nanosleep(), since you have to read from /dev/rtc.

Reimar · July 10, 2008, 2:56pm

That definitly is outdated. While I can’t vouch for its accuracy, this one sounds more up-to-date:

Linux Highres timers

seibert · July 10, 2008, 4:27pm

Oh, hey, that’s nice to see. I assume that patch made it into one of the later CUDA-supported Linux distributions.

Reimar · July 10, 2008, 6:46pm

For 2.0b2 I think so, newer Ubuntu versions should have it. Personally, I am using a newer, self-compiled kernel with SuSE 10.2 and it works fine too, except that you might have to integrate a patch into the driver installer for older driver versions.

Topic		Replies	Views
The solution to wait for the GPU? What's the best variant? CUDA Programming and Performance	2	2000	July 25, 2008
Do the non-async calls sleep or burn CPU? CUDA Programming and Performance	20	22038	January 13, 2008
Host CPU busy while waiting ? CUDA Programming and Performance	3	2127	May 5, 2009
100% CPU usage when running CUDA code CUDA Programming and Performance	5	4786	October 31, 2023
Can kernel function parallel with CPU code? CUDA Programming and Performance	12	7732	December 5, 2008
CUDA CPU Utilization CUDA Programming and Performance	8	14863	September 11, 2008
CPU load when kernel is running why 100%? CUDA Programming and Performance	14	8162	December 22, 2008
is kernel in stream 0 asynchronous? CUDA Programming and Performance	10	3710	April 23, 2011
CPU Spins while waiting for GPU to finish computation Profiling Linux Targets	6	1323	March 2, 2023
CPU usage at 99% while kernel is running CUDA Programming and Performance	5	4452	September 3, 2008

100% CPU use while waiting for kernel

Related topics