Hello, I had a question related to the “timeout” and the fact that a kernel will stop after a few seconds.
I am using Ubuntu 20.04 and a NVIDIA GeForce RTX 3050 with driver 525.60.11.
When I run a quite long CUDA kernel (more than 100s) I don’t get any timeout while on an old computer with a GTX 770 a have a timeout and the kernel stops after 5s.
My question is why I don’t get the timeout under the recent computer ? Is it related to
Ubuntu / Gnome ?
the GPU ?
or the GPU driver ?
and is it possible to remove the timeout or to set it for example using nvidia-smi ? I am working with students and they have laptops with Linux/Debian and a Quadro M1200 and they have the timeout active.
This is related to a GUI using the GPU. A long-running compute kernel typically blocks graphical tasks like updating the GUI. A frozen GUI makes for a bad user experience. So operating system running a GUI limit the runtime of compute kernels to about two seconds. This limit is guarded by a watchdog timer that triggers destruction of the compute context when it expires, which triggers a CUDA error detectable with proper CUDA error checking.
There are operating-system dependent ways of turning the GUI watchdog timer on or of and/or set the time limit. This is controlled outside of CUDA, so look in the documentation of your operating system(s) of how to use these controls. Alternatively, do not extend the GUI to the GPU in question.