A quick question. I have noticed that after launching a kernel and while waiting for it to finish,
the host CPU appears to be running at 100% (while, I assume, the only thing that does is polling
the GPU). The solution of inserting a sleep() in the host code does the trick, but does not appear to
be very elegant (mainly because you have to do a pre-timing of the kernel, so that you know for how
long you can safely sleep()).
What have I missed ?
Apologies if it turns-out that I haven’t RTFM for long enough,
Basically you have to choices - either let the host thread do something constructive while you wait for the kernel to finish, or sit in a spinlock at a cudaThreadSynchronize() (either your own or one contained within one a copy request). Either way the thread will eat cpu cycles. That is how it works. Artificially adding sleeps() or whatever isn’t a good idea.
CUDA 2.2 adds an context creation flag that will prevent the process from spin-waiting and it will actually sleep (presumably woken on an interrupt). Using it will increase latency, though.
Thank you both for your replies. The CPU is doing something useful after launching the kernel
(which is to prepare or analyze data), but that takes only, say, 5 sec of CPU time, whereas each
kernel runs for tens of seconds. I found it rather wasteful to run a core at 100% while waiting for
the GPU to finish the actual calculation.