I seem to be experiencing some instability when running my CUDA application. Iâ€™ve hunted the bug down to a set of trig functions (sinf() cosf() etc) that are being executed in a kernel. When running, suddenly the screen goes blank and Iâ€™m informed that the driver has stopped working. If I replace the functions with their intrinsic equivalents, the system runs correctly. The software is executing the kernal a number of times as part of an iterative process, during the first few iterations the system runs correctly, but crashes soon after.
Is this a common problem? If so is there any way I can stop this happening?
This sounds like a problem with the watchdog timer more than a problem with the math functions themselves. If you are running a CUDA program on your display device, then the display driver watchdog will terminate any kernel that runs for more than a few seconds in order to allow the screen to update. (GPUs cannot do preemptive multitasking yet.) The difference between the standard trig functions and the intrinsic trig functions is how long they take. So, it is possible that the slower and more accurate functions are pushing your kernel runtime over the limit.
If the watchdog is killing your kernel, your two best options are either to split your kernel workload into multiple calls, or to run your program on a second, non-display CUDA device.
Thankyou for your reply, what your saying does make sense. A single execution of the kernel in question takes very little time, but when run 50 times (the number of iterations it’s set to run for) the driver tends to crash. Would the watchdog timer kill my kernel under these circumstances?
I’ve also found that if I halt the host thread using the Sleep() function for 100 milliseconds or so, this seems to also stop the system crashing. Obviously this ruins the objective of using CUDA to speed up the execution.
This seems a little weird, as I would normally expect that the driver can update the screen between kernel calls, even if you have queued up many of them. I’m assuming you are running on Windows, in which case I don’t have as much experience with the typical watchdog behavior. Does anyone know if the batching of kernel calls on Windows can lead to this?
One idea: Instead of adding a Sleep() function, can you call cudaDeviceSynchronize() between kernel calls? This will also slow down your code, but not as much as Sleep().
cudaDeviceSynchronize() seemed to work! Interestingly the console updates a lot more smoothly now, I guess the display was struggling before?
Under Windows kernel launches are batched. The screen only updates between batches, not individual kernels. Accordingly, the watchdog timer timeout also applies to whole batches. Inserting cudaDeviceSynchronize() however sends off all currently scheduled kernels and starts a new batch, thus allowing screen updates in between.
I’m assuming that executing a Kernel multiple times in a for-loop would count as a ‘batch’. is that right?
Yes, unless there are calls in between like cudaDeviceSynchronize() that require synchronization.