Cuda Error Illegal Address

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
dGPU
• DeepStream Version
6.1.1
• TensorRT Version
8.4.1-1+cuda11.6
• NVIDIA GPU Driver Version (valid for GPU only)
515.65.01
• Issue Type( questions, new requirements, bugs)
Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am running a deepstream application using docker image nvcr.io/nvidia/deepstream:6.1.1-devel,
with rtsp sources added dynamically and an rtsp out, full pipeline below

The application is running very well without crashing on a Quadro P1000 with the same docker image and driver version 515.65.01,
However when running on Quadro RTX 5000 I keep getting this error and the application doesn’t keep running for 10 minutes and keeps crashing on random intervals

ERROR: nvdsinfer_context_impl.cpp:339 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1625 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:23:01.478337623     1 0x7f144401b180 WARN                 nvinfer gstnvinfer.cpp:1338:gst_nvinfer_input_queue_loop:<primary-inference> error: Failed to queue input batch for inferencing
GPUassert: an illegal memory access was encountered src/modules/NvMultiObjectTracker/context.cpp 197
Error: gst-stream-error-quark: Failed to queue input batch for inferencing (1): gstnvinfer.cpp(1338): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference
# Gst.MessageType.ERROR, debug is gstnvinfer.cpp(1338): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference, msg.src.name primary-inference#
 2023-05-28 09:13:21 INFO     ds_utils   404
NoneType: None
 2023-05-28 09:13:21 ERROR    ds_utils   405
[WARN ] 2023-05-28 09:13:21 (cudaErrorIllegalAddress)
[ERROR] 2023-05-28 09:13:21 Error destroying cuda device: 0�&

Thought it was a model error, tried different models from here.
(different models running well on the quadro p1000 machine)
But I am encountering the same error.

Are you running DeepStream app in the host machine or in docker?

Seems it is CUDA failure. Can you check whether CUDA can work with the CUDA samples? NVIDIA/cuda-samples: Samples for CUDA Developers which demonstrates features in CUDA Toolkit (github.com)

Or you can use TensorRT sample to test whether TensorRT works correct. NVIDIA/TensorRT: NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. (github.com)

I am running all code in the nvcr.io/nvidia/deepstream:6.1-devel container in a docker compose environment and passing the gpu using

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

in my .yml file,
It is worth mentioning that it is a multi-gpu machine and the 1st card is used for this deepstream app.

Ran the transpose and eigen values samples here is the output :-


Can the c/c++ deepstream-test1 sample run correctly?


It can run after changing the sink type to fakesink because there is no display.

This is not a model error but a CUDA error. How many RTX 5000 GPUs do you have? Can you reproduce the same problem with different RTX 5000?

6 other cards, tried running the app on another one and it crashed with the same error.

Have you checked the GPU memory usage abd other resources usage during your pipeline is running?

Yes, GPU memory usage was always under 3GB with plenty of free RAM and CPU utilization was low.
My other lower spec’d machine (Quadro P1000) running the same app doesn’t encounter this error as I mentioned before, and this machine is running Ubuntu Desktop 20.04
Unlike the other one that is crashing which is running Ubuntu Server 22.04 .
Could the OS be related?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Currently DeepStream SDK only supports Ubuntu20.04.

Please follow the compatibility. Quickstart Guide — DeepStream 6.2 Release documentation

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.