Hello,
I’m running jetson nano without monitor attached (network access) and I continuously get CUDA_ERROR_LAUNCH_TIMEOUT exception in my application. As far as I figured out (from forums, so it’s not 100% truth), it’s due to Timeout Detection and Recovery, which kills all processes that make GPU unit unresponsive for more then 5 sec. And as far as I understand, it’s possible to disable it in Linux.
I’ve already tried
but it didn’t help. may be I’m doing something wrong.
We do limit the GPU kernel to 5 seconds on Jetson Nano.
For any GPU task that runs over 5s, it will be killed by the watchdog automatically.
It’s not recommended to disable this mechanism since Nano doesn’t have slice support to share GPU resources.
But if this is good for your use case, please check the following comment for the detailed steps: