Inconsistent Errors with DeepStream 6.2 on dGPU: `cuInit failed: 999`

thamnt · December 26, 2024, 2:53am

Subject: Inconsistent Errors with DeepStream 6.2 on dGPU: cuInit failed: 999

Hardware Platform: dGPU
DeepStream Version: 6.2
TensorRT Version: 8.5.2-1
NVIDIA GPU Driver Version: 535.54.03

Issue Description:

I am encountering inconsistent errors while running a DeepStream-based camera analytics application on my dGPU. Initially, the application runs smoothly, but after a prolonged period, it fails with the following error:

nvbufsurftransform:cuInit failed : 999

When this happens, I checked the CUDA status using torch.cuda.is_available() and it returned False.

I attempted to resolve the issue by running the following commands, as suggested in other discussions:

sudo modprobe --remove nvidia-uvm  # same as `rmmod`
sudo modprobe nvidia-uvm

While this temporarily allows the application to run again, the issue reoccurs shortly thereafter.

Additionally, at times, the application fails with a completely different set of errors, without any changes to the source code:

WARNING: [TRT]: Unable to determine GPU memory usage
WARNING: [TRT]: Unable to determine GPU memory usage
WARNING: [TRT]: CUDA initialization failure with error: 214. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
python3: ../nvdsinfer/nvdsinfer_model_builder.cpp:618: nvdsinfer::TrtModelBuilder::TrtModelBuilder(int, nvinfer1::ILogger&, const std::shared_ptr<nvdsinfer::DlLibHandle>&): Assertion `m_Builder' failed.

What confuses me is that other servers with identical configurations (hardware, software versions, and app setup) do not encounter this problem, and continue to run without issues.

Could you give me some advice? Thank you for your help!

fanzh · December 27, 2024, 5:44am

the component version needs to meet the requirements. please refer to this compatibility table. DS6.2 requires R525.85.12 driver.

thamnt · December 27, 2024, 6:13am

@fanzh I cannot explain why other servers with identical configurations (hardware, software versions, and app setup) do not encounter this problem.
I find a topic having similar issue:

fanzh · December 31, 2024, 10:07am

To narrow down this issue, please install the corresponding component version. if you still encounter the error, which sample are you testing? what is the whole media pipeline? could you simplify the code the reproduce this issue? Thanks!

Topic		Replies	Views
nvbufsurftransform:cuInit failed : 999 DeepStream SDK	8	1671	June 13, 2023
• NVIDIA GeForce RTX 3070 • DeepStream 6.2 I am using Deepstream to do some video analytics It is working ok, but sometimes I have a error: DeepStream SDK	4	234	July 12, 2023
nvbufsurftransform:cuInit failed : 100 DeepStream SDK deepstream , deepstream61	2	1143	June 5, 2023
Deepstream 6.3 be nvbufsurftransform:cuInit failed:999 DeepStream SDK cuda , gstreamer , deepstream	4	165	June 25, 2024
New installation Multiple Failues DeepStream SDK	18	1130	June 28, 2022
Deepstream Compatibility DeepStream SDK cudnn	3	370	June 25, 2024
nvbufsurftransform:cuInit failed : 100 on A4000 DGPU DeepStream SDK	7	232	July 3, 2024
Cuda Error Illegal Address DeepStream SDK	10	817	June 7, 2023
cuGraphicsGLRegisterBuffer failed with error(219) gst_eglglessink_cuda_init texture = 1 DeepStream SDK	2	718	October 12, 2021
nvbufsurftransform:cuInit failed : 999 after Suspend ubuntu DeepStream SDK cuda , docker , python	3	454	September 19, 2023

Inconsistent Errors with DeepStream 6.2 on dGPU: `cuInit failed: 999`

Related topics