Deepstream returns "Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress"

Hi,

I converted this onnx model to TensorRT using

/usr/src/tensorrt/bin/trtexec --onnx=darknet-53_b1_sim.onnx --workspace=64 --fp16 --saveEngine=darknet-53_b1.engine

The conversion worked with ScatterND plugin being created during the process.

[06/09/2022-03:34:48] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND

I created a simple app that runs TensorRT engine file (no DeepStream) to verify the engine and the inference worked fine with initLibNvInferPlugins(nullptr, ""); added to the code. However, when I ran a simple DeepStream app based on deepstream_test1_app.c, with the same engine. I got

root@64af4ac4e10d:/project# ./bin/test_reid /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Now playing: /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
0:00:01.719414205   498 0x56386ca3b230 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/project/models/people_reid/darknet-53_b1.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           3x608x1088      
1   OUTPUT kFLOAT output          54264x518       

0:00:01.719457301   498 0x56386ca3b230 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /project/models/people_reid/darknet-53_b1.engine
0:00:01.856497390   498 0x56386ca3b230 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:configs/people_reid/pgie.cfg sucessfully
Running...
Failed to establish dbus connectionERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: [TRT]: 1: [slice.cu::launchNaiveSliceImpl::148] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:02.174864705   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
0:00:02.175048840   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1789> [UID = 1]: Tried to release an unknown outputBatchID
ERROR from element primary-nvinference-engine: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Error details: gstnvinfer.cpp(2325): gst_nvinfer_output_loop (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine
Cuda failure: status=700
Error(-1) in buffer allocation
0:00:02.175101258   498 0x56386c1d3d90 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
Returned, stopping playback

** (test_reid:498): CRITICAL **: 03:42:26.526: gst_nvds_buffer_pool_alloc_buffer: assertion 'mem' failed
0:00:02.175176211   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Internal data stream error.
0:00:02.175182255   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: streaming stopped, reason error (-5)
ERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:00:02.175212917   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Segmentation fault (core dumped)

When I ran the same app but with a different TensorRT engine converted without any plugins being used during the conversion, the app worked fine. So, the issue has to do with the ScatterND plugin being used in the app. For deepstream app, how can we fix this? I added initLibNvInferPlugins(nullptr, ""); to the app and link nvinfer_plugin library to the app but the error still occurred.

Steps to reproduce

  1. run docker nvcr.io/nvidia/deepstream:6.0-devel
  2. convert this onnx model to TensorRT using
/usr/src/tensorrt/bin/trtexec --onnx=darknet-53_b1_sim.onnx --workspace=64 --fp16 --saveEngine=darknet-53_b1.engine
  1. create a simple DeepStream app that uses nvinfer plugin and run the engine file.

nvinfer plugin’s config file

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-engine-file=../../models/people_reid/darknet-53_b1.engine
# model-engine-file=../../models/people_tracking/bytetrack_s.engine
# custom-lib-path=/usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so
# scaling-filter=1
# batch-size=1
force-implicit-batch-dim=1
model-color-format=0
interval=0
gie-unique-id=1
output-blob-names=output

# 0=Detector, 1=Classifier, 2=Segmentation, 100=Other
network-type=100

# Enable tensor metadata output
output-tensor-meta=1

Environment
Architecture: x86_64
GPU: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
NVIDIA GPU Driver: Driver Version: 495.29.05
DeepStream Version: 6.0 (running on docker image nvcr.io/nvidia/deepstream:6.0-devel)
TensorRT Version: v8001
Issue Type: Question

1 about "When I ran the same app but with a different TensorRT engine converted without any plugins being used during the conversion, the app worked fine. " do your use the same mode?
2 why there is no more information about your model , like shape.
3 you can try to use onnx model in deepstream,

Thanks for the response. Please find the answers below.

  1. Mode? did you mean model? No, it was a different model. Onnx model here. This model is converted with
 trtexec --onnx=bytetrack_s_dynamic.onnx --workspace=64 --minShapes=images:1x3x608x1088 --optShapes=images:2x3x608x1088 --maxShapes=images:2x3x608x1088 --fp16 --saveEngine=bytetrack_s.engine

both models are of fp16 format

  1. I inspected the onnx model before conversion using Netron, the shape is shown below. Moreover, The desptream log above contains model shape information.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           3x608x1088      
1   OUTPUT kFLOAT output          54264x518

  1. all models tested were converted from onnx format.

Using the same step , I can reproduce the issue, will continue to check.

1 Like

1 your model’s dimension 54264x518 is very large, it will cost too much memory, please execute “export CUDA_LAUNCH_BLOCKING=1” and try again.
2 if network-type=100, you need to implement postporcess to NvDsInferTensorMeta, please refer to sample deepstream-infer-tensor-meta-test or GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline.

1 Like

Thanks for your response, by enabling export CUDA_LAUNCH_BLOCKING=1, I don’t get any error but the stream processing is very slow and laggy. I guess i’d have to use a small model.

By the way, how did you know it was due to large model’s output dimension and not other issue?

700 is memory related issue, deepstream native models dose not have this kink of issue, the output of this model is very large comparatively, we will check if can cover this case.

1 Like

Thanks for your response, I changed the model to a much smaller one, yolov5s.

Although this model is much smaller, I still got

root@64af4ac4e10d:/project# ./bin/test_reid /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Now playing: /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
0:00:01.200104695   843 0x56290d5f1030 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/project/yolov5s_b1.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT images          3x640x640       
1   OUTPUT kFLOAT output          25200x85        

0:00:01.200153457   843 0x56290d5f1030 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /project/yolov5s_b1.engine
0:00:01.213798602   843 0x56290d5f1030 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:configs/vehicle_reid/pgie.cfg sucessfully
Running...
Failed to establish dbus connectionERROR: [TRT]: 1: [convolutionRunner.cpp::checkCaskExecError<false>::440] Error Code 1: Cask (Cask Convolution execution)
ERROR: [TRT]: 1: [apiCheck.cpp::apiCatchCudaError::17] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:01.484670163   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.484747686   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.484769807   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.484794925   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR from element primary-nvinference-engine: Failed to queue input batch for inferencing
Error details: gstnvinfer.cpp(1324): gst_nvinfer_input_queue_loop (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine
Cuda failure: status=700
Error(-1) in buffer allocation
Returned, stopping playback

** (test_reid:843): CRITICAL **: 16:10:24.306: gst_nvds_buffer_pool_alloc_buffer: assertion 'mem' failed
0:00:01.484930521   843 0x56290d3f2940 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Internal data stream error.
0:00:01.484937886   843 0x56290d3f2940 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: streaming stopped, reason error (-5)
Cuda failure: status=700
Error(-1) in buffer allocation
Segmentation fault (core dumped)

Again Cuda failure: status=700. However, when I did export CUDA_LAUNCH_BLOCKING=1. It worked fine and it wasn’t laggy at all. So I guess, model size is not the root cause? because this new model is roughly the same size as the working one I provided earlier. The only difference between this new model and the previous working one is that this one registered the ScatterND plugin during the conversion from onnx to tensorrt.

06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND

Due to this, I doubt it is the registered plugin that caused DeepStream to fail?

TensorRT engine converted with FP16 mode using

trtexec --onnx=yolov5s_b1_mod.onnx --workspace=1024 --fp16 --saveEngine=yolov5s_b1.engine

The environment and sample app are the same as before. Note: I marked this thread as unresolved for now.

Thanks

1 where did you add that initLibNvInferPlugins?
2 here are some samples about using yolov5 in deepstream,
can you find any difference? GitHub - beyondli/Yolo_on_Jetson, Custom Yolov5 on Deepstream 6.0

  1. initLibNvInferPlugins(nullptr, ""); is the first command before everything. I tried with and without the command. it didn’t work either.
  2. checked. In my case, my sample app is in C code based on deepstream_test1_app.c. The repo you provided didn’t implement the app this way. For the custom plugins, I didn’t use it. I enabled
network-type=100
output-tensor-meta=1

in the config want to parse the result manually based on the sample app, deepstream-infer-tensor-meta-test.

please refer to this link, it is similar to your issue, Unexpected exception an illegal memory access was encountered - #3 by mchi

Yes. That was the root cause of the issue. I tried to convert with inplace=False, illegal memory access error no longer appears.