Cuda error when running multiple pipelines one after another

samuel17 · June 15, 2022, 11:10am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Good morning,

I have been facing an issue for a few months that I am not trying to fix. We have internal software that goes through a list of models and runs a pipeline one after another. After running 3-7 pipelines in a row, I sometimes get this error below. The config associated with the pipeline works as when I re-run the pipeline that had an error there is no issue - it only happens when I queue multiple pipelines one after another.

My guess is that between each pipeline there is a memory allocation that sometimes is not released. Can you advise on how to fix this issue? This issue occurs when running tlt-converter or right before a pipeline is started

We are using deepstream-5.1 with TLT/TAO models

ERROR: nvdsinfer_context_impl.cpp:1573 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:00:01.229666287   976      0x24a3ed0 WARN                 nvinfer gstnvinfer.cpp:2021:gst_nvinfer_output_loop:<primary-inference> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
0:00:01.229755945   976      0x24a3ed0 WARN                 nvinfer gstnvinfer.cpp:616:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1599> [UID = 1]: Tried to release an unknown outputBatchID
Error: gst-stream-error-quark: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR (1): gstnvinfer.cpp(2021): gst_nvinfer_output_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference

Fiona.Chen · June 16, 2022, 1:38am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

samuel17 · June 17, 2022, 5:39pm

• Hardware Platform (Jetson / GPU)
T4 & 3090
• DeepStream Version
5.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
7.2
• NVIDIA GPU Driver Version (valid for GPU only)
470
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
have a script that iterates over 2 videos and run deepstream on that video at a time. This will start a pipeline as soon as a pipeline has finished.
for vid in vid_folder:
run_deepstream(vid)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
cuda illegal access memory

Fiona.Chen · June 19, 2022, 11:56am

Have you tried the case on the latest DeepStream 6.1 version?

How can we reproduce the issue in our side?

yingliu · July 1, 2022, 7:00am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

system · July 15, 2022, 7:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.