• Hardware Platform (Jetson / GPU) T4 GPU
• DeepStream Version Deepstream-5.0
• NVIDIA GPU Driver Version (valid for GPU only) 460.32.03
We are using the Milestone VPS plugin with Deepstream Pipeline that includes different Detection/Classification Models.
we are using Milestone 2020 R2 and VPS From MIPSDK 2020 R1. Milestone is running on a Windows OS while the VPS/Deepstream Pipeline is running on Ubuntu 18.04.
The pipeline (Deepstream) is crashing during runtime due to the error :
Error: could not get cuda device count (cudaErrorNoDevice)
Could not get cuda device count
When we run the pipeline, all the models are starting to instantiate and start running. Yet at some random point, it crashes and starts giving this error (attached below).
The pipeline was running normally over the course of several days and during the testing phase and passed all the stress test. However, this error happened several times this week. Restarting the pipeline or docker makes it work again normally, yet it continued to show after some time. Also note that running the command:
watch nvidia-smi shows complete info in regards of the GPU, but when the error happens, all the processes running and its metrics (Mem allocated, GPU util, info) disappear (turn into 0) and the GPU info remains showing without any running process.
Can you please help with this?