Could not get cuda device count

• Hardware Platform (Jetson / GPU) T4 GPU
• DeepStream Version Deepstream-5.0
• NVIDIA GPU Driver Version (valid for GPU only) 460.32.03

We are using the Milestone VPS plugin with Deepstream Pipeline that includes different Detection/Classification Models.
we are using Milestone 2020 R2 and VPS From MIPSDK 2020 R1. Milestone is running on a Windows OS while the VPS/Deepstream Pipeline is running on Ubuntu 18.04.

The pipeline (Deepstream) is crashing during runtime due to the error :

Error: could not get cuda device count (cudaErrorNoDevice)
Could not get cuda device count

When we run the pipeline, all the models are starting to instantiate and start running. Yet at some random point, it crashes and starts giving this error (attached below).

The pipeline was running normally over the course of several days and during the testing phase and passed all the stress test. However, this error happened several times this week. Restarting the pipeline or docker makes it work again normally, yet it continued to show after some time. Also note that running the command: watch nvidia-smi shows complete info in regards of the GPU, but when the error happens, all the processes running and its metrics (Mem allocated, GPU util, info) disappear (turn into 0) and the GPU info remains showing without any running process.

Can you please help with this?

Capture2.PNG

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

could you capture the log with below command and share the log when crash is reproduced?

$ sudo nvidia-bug-report.sh