Disabling gk20a timeouts_enable

In a nvidia blog (https://devblogs.nvidia.com/parallelforall/cuda-jetson-nvidia-nsight-eclipse-edition/), I’ve noticed :
… note that by default Jetson doesn’t allow any application to solely occupy the GPU 100% of the time. In order to run the debugger on Jetson TK1, you need to fix this using the following command (note: this is not required on Jetson TX1/TX2).

sudo echo N > sys/kernel/debug/gk20a.0/timeouts_enabled

Does anyone knows :

  • how this feature works ?
  • if it can prevent a CUDA application from getting full GPU usage ?
  • if disabling it can lead to troubles ?

Thanks.

Hi Honey,

Thanks for your question.
We will check this issue internally and update information to you later.

Thanks.

Hi,

Sorry for the late.

This is sched timeout.
If one channel hogs the GPU for too long, sched timeout will kill the channel.
We do this since the whole UI is frozen on the duration of the job, so we cannot allow arbitrarily long CUDA kernels.

This is particularly a problem for debugger that can stall the GPU for indeterminate lengths of time.

For tk1, workaround is to disable sched timeout by

sudo echo N > sys/kernel/debug/gk20a.0/timeouts_enabled

Hi AastaLLL,

Thanks for your reply. I have just one question more :
If I disable timeouts, then I launch a CUDA app that could hog the GPU for 1 minute, is there more risk than having the GUI frozen ? (I have an application that has no GUI, only TCP is used to/from TK1, only before and after CUDA).

Thanks.

Hi,

From security point of view, we are mostly concerned about WebGL and HTML rendering contexts.
Those run 3rd party content and we need to keep timeout very low (~200ms) for those contexts.

We are not concerned about CUDA and other userspace applications “authorized” (installed) by user.
You can disable it if needed.

Thanks.