After I call a kernel, I want the CPU to wait until it’s finished, then copy the results back to the host.
If I do something like:
myKernel<<<grid, block, block.x*4*sizeof(int)>>>(dCurrentN); CUT_CHECK_ERROR("Kernel execution failed"); CUDA_SAFE_CALL( cudaMemcpy( hCurrentN, dCurrentN, MemSize, cudaMemcpyDeviceToHost) );
The Memcpy will wait for the kernel to finish, as expected.
Everything runs fine, but the CPU spins using 100% while waiting.
I read this is to reduce latency… I can’t find that reference anymore, it may have been on the forum and not the docs.
I want the CPU to sleep while waiting, not poll.
I thought the solution was to call cudaThreadSynchronize() after the kernel.
If I add that, my program still runs fine but still uses 100% CPU.
Currently I can hack an ugly short-term workaround to the pegged CPU by manually putting a Windows Sleep() call after the kernel… but that’s not what I want since that’s a hardwired time delay, bad for many reasons.
This is on an old G80 board as well as a new GX280, both using the latest CUDA 2.0 beta SDK on Windows XP.
How can I avoid the 100% CPU while waiting for a kernel to finish?