The 5 seconds of kernel launch time limitation is making cuda development very inconvenient. :wacko:
Though some kind of programs can be divided into small pieces, lots of programs can’t be divided in an easy way!
To make cuda development more difficult simply cause fewer developers to use cuda.
I just can’t understand why a kernel launch MUST be synchronous? Can’t there be an API that start the kernel and return instantly, and another API to poll whether the kernel launch has finished? A user mode polling/wait can be much better than an kernel mode polling/wait used in current cuda.
By the way, most other parts of cuda are very nice.
This is a limitation of Windows XP when the CUDA device is shared between computation and graphics. When your kernel is running, the graphics driver cannot update the GUI, and after a few seconds, the operating system decides something is wrong and aborts your kernel.
The solution to this is to have a CUDA device which is not running your main display. (Or to use Linux, where you can decide not to run a GUI at all.)
I don’t understand this part. From the perspective of the user code, this is exactly what happens. The kernel starts asynchronously, and the CPU continues executing your program. You can check on the status, or deliberately run a function to wait on the results.
Getting rid of the watchdog timer on the primary display would require a way to swap a running kernel’s register file and shared memory out to global memory mid-execution. Then the graphics driver could be given a time slice periodically to avoid the watchdog. Not impossible, though I have no idea if the current hardware is capable of this.
The explanation make things clear. So this limitation come from microsoft instead of nvidia. :blink:
But is there any way to disable this time limit on windows? Windows do this kind of check to prevent a bad graphics call hang the entire system, but a good cuda kernel can still need long time to run. To break the tasks into small pieces can cause a redesign of the cuda kernel, and really make the cuda development more difficult.
You can use CUDA even if there’s only one NVIDIA card in system, you just need to take extra care. Partition your worload into smaller chunks so that single kernel invocation stays well within 2s limitation.
And if you want Windows UI to be responsive while performing compatations on card, you should partition your task in even smaller chunks so that they run for 50ms or so.
Of course you can use CUDA with only one card in the system. It wouldn’t be much use otherwise… tmurray gave you all the details you need I will just add by 2c of anecdotal information:
I’ve been developing CUDA applications for nearly 2 years now on single GPU machines. Nearly all kernels I’ve every written complete in milliseconds. In fact, I have never EVER seen the 5s launch timeout in 2 years of development unless I explicitly tried to trigger it (or, umm, accidentally wrote an infinite loop).