Deepstream returns "Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress"

peeranat85 · June 8, 2022, 8:56pm

Hi,

I converted this onnx model to TensorRT using

/usr/src/tensorrt/bin/trtexec --onnx=darknet-53_b1_sim.onnx --workspace=64 --fp16 --saveEngine=darknet-53_b1.engine

The conversion worked with ScatterND plugin being created during the process.

[06/09/2022-03:34:48] [W] [TRT] onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND
[06/09/2022-03:34:48] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/09/2022-03:34:48] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/09/2022-03:34:48] [I] [TRT] Successfully created plugin: ScatterND

I created a simple app that runs TensorRT engine file (no DeepStream) to verify the engine and the inference worked fine with initLibNvInferPlugins(nullptr, ""); added to the code. However, when I ran a simple DeepStream app based on deepstream_test1_app.c, with the same engine. I got

root@64af4ac4e10d:/project# ./bin/test_reid /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Now playing: /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
0:00:01.719414205   498 0x56386ca3b230 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/project/models/people_reid/darknet-53_b1.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           3x608x1088      
1   OUTPUT kFLOAT output          54264x518       

0:00:01.719457301   498 0x56386ca3b230 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /project/models/people_reid/darknet-53_b1.engine
0:00:01.856497390   498 0x56386ca3b230 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:configs/people_reid/pgie.cfg sucessfully
Running...
Failed to establish dbus connectionERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: [TRT]: 1: [slice.cu::launchNaiveSliceImpl::148] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:02.174864705   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
0:00:02.175048840   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::releaseBatchOutput() <nvdsinfer_context_impl.cpp:1789> [UID = 1]: Tried to release an unknown outputBatchID
ERROR from element primary-nvinference-engine: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Error details: gstnvinfer.cpp(2325): gst_nvinfer_output_loop (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine
Cuda failure: status=700
Error(-1) in buffer allocation
0:00:02.175101258   498 0x56386c1d3d90 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
Returned, stopping playback

** (test_reid:498): CRITICAL **: 03:42:26.526: gst_nvds_buffer_pool_alloc_buffer: assertion 'mem' failed
0:00:02.175176211   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Internal data stream error.
0:00:02.175182255   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: streaming stopped, reason error (-5)
ERROR: nvdsinfer_context_impl.cpp:1763 Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress
0:00:02.175212917   498 0x56386c1d3d40 WARN                 nvinfer gstnvinfer.cpp:2325:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Failed to dequeue output from inferencing. NvDsInferContext error: NVDSINFER_CUDA_ERROR
Segmentation fault (core dumped)

When I ran the same app but with a different TensorRT engine converted without any plugins being used during the conversion, the app worked fine. So, the issue has to do with the ScatterND plugin being used in the app. For deepstream app, how can we fix this? I added initLibNvInferPlugins(nullptr, ""); to the app and link nvinfer_plugin library to the app but the error still occurred.

Steps to reproduce

run docker nvcr.io/nvidia/deepstream:6.0-devel
convert this onnx model to TensorRT using

/usr/src/tensorrt/bin/trtexec --onnx=darknet-53_b1_sim.onnx --workspace=64 --fp16 --saveEngine=darknet-53_b1.engine

create a simple DeepStream app that uses nvinfer plugin and run the engine file.

nvinfer plugin’s config file

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-engine-file=../../models/people_reid/darknet-53_b1.engine
# model-engine-file=../../models/people_tracking/bytetrack_s.engine
# custom-lib-path=/usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so
# scaling-filter=1
# batch-size=1
force-implicit-batch-dim=1
model-color-format=0
interval=0
gie-unique-id=1
output-blob-names=output

# 0=Detector, 1=Classifier, 2=Segmentation, 100=Other
network-type=100

# Enable tensor metadata output
output-tensor-meta=1

Environment
Architecture: x86_64
GPU: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
NVIDIA GPU Driver: Driver Version: 495.29.05
DeepStream Version: 6.0 (running on docker image nvcr.io/nvidia/deepstream:6.0-devel)
TensorRT Version: v8001
Issue Type: Question

fanzh · June 10, 2022, 9:42am

1 about "When I ran the same app but with a different TensorRT engine converted without any plugins being used during the conversion, the app worked fine. " do your use the same mode？
2 why there is no more information about your model , like shape.
3 you can try to use onnx model in deepstream,

peeranat85 · June 10, 2022, 10:23am

Thanks for the response. Please find the answers below.

Mode? did you mean model? No, it was a different model. Onnx model here. This model is converted with

 trtexec --onnx=bytetrack_s_dynamic.onnx --workspace=64 --minShapes=images:1x3x608x1088 --optShapes=images:2x3x608x1088 --maxShapes=images:2x3x608x1088 --fp16 --saveEngine=bytetrack_s.engine

both models are of fp16 format

I inspected the onnx model before conversion using Netron, the shape is shown below. Moreover, The desptream log above contains model shape information.

INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input           3x608x1088      
1   OUTPUT kFLOAT output          54264x518

all models tested were converted from onnx format.

fanzh · June 15, 2022, 1:30am

Using the same step , I can reproduce the issue, will continue to check.

fanzh · June 20, 2022, 8:19am

1 your model’s dimension 54264x518 is very large, it will cost too much memory, please execute “export CUDA_LAUNCH_BLOCKING=1” and try again.
2 if network-type=100, you need to implement postporcess to NvDsInferTensorMeta, please refer to sample deepstream-infer-tensor-meta-test or GitHub - NVIDIA-AI-IOT/deepstream_pose_estimation: This is a sample DeepStream application to demonstrate a human pose estimation pipeline.

peeranat85 · June 20, 2022, 9:15am

Thanks for your response, by enabling export CUDA_LAUNCH_BLOCKING=1, I don’t get any error but the stream processing is very slow and laggy. I guess i’d have to use a small model.

peeranat85 · June 20, 2022, 10:35am

By the way, how did you know it was due to large model’s output dimension and not other issue?

fanzh · June 21, 2022, 8:00am

700 is memory related issue, deepstream native models dose not have this kink of issue, the output of this model is very large comparatively, we will check if can cover this case.

peeranat85 · June 21, 2022, 9:21am

Thanks for your response, I changed the model to a much smaller one, yolov5s.

Although this model is much smaller, I still got

root@64af4ac4e10d:/project# ./bin/test_reid /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
Now playing: /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264
0:00:01.200104695   843 0x56290d5f1030 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/project/yolov5s_b1.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT images          3x640x640       
1   OUTPUT kFLOAT output          25200x85        

0:00:01.200153457   843 0x56290d5f1030 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /project/yolov5s_b1.engine
0:00:01.213798602   843 0x56290d5f1030 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:configs/vehicle_reid/pgie.cfg sucessfully
Running...
Failed to establish dbus connectionERROR: [TRT]: 1: [convolutionRunner.cpp::checkCaskExecError<false>::440] Error Code 1: Cask (Cask Convolution execution)
ERROR: [TRT]: 1: [apiCheck.cpp::apiCatchCudaError::17] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
ERROR: nvdsinfer_backend.cpp:506 Failed to enqueue trt inference batch
ERROR: nvdsinfer_context_impl.cpp:1643 Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:00:01.484670163   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.484747686   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.484769807   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR: nvdsinfer_context_impl.cpp:341 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1619 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
0:00:01.484794925   843 0x56290d3f2990 WARN                 nvinfer gstnvinfer.cpp:1324:gst_nvinfer_input_queue_loop:<primary-nvinference-engine> error: Failed to queue input batch for inferencing
ERROR from element primary-nvinference-engine: Failed to queue input batch for inferencing
Error details: gstnvinfer.cpp(1324): gst_nvinfer_input_queue_loop (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine
Cuda failure: status=700
Error(-1) in buffer allocation
Returned, stopping playback

** (test_reid:843): CRITICAL **: 16:10:24.306: gst_nvds_buffer_pool_alloc_buffer: assertion 'mem' failed
0:00:01.484930521   843 0x56290d3f2940 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: Internal data stream error.
0:00:01.484937886   843 0x56290d3f2940 WARN                 nvinfer gstnvinfer.cpp:2288:gst_nvinfer_output_loop:<primary-nvinference-engine> error: streaming stopped, reason error (-5)
Cuda failure: status=700
Error(-1) in buffer allocation
Segmentation fault (core dumped)

Again Cuda failure: status=700. However, when I did export CUDA_LAUNCH_BLOCKING=1. It worked fine and it wasn’t laggy at all. So I guess, model size is not the root cause? because this new model is roughly the same size as the working one I provided earlier. The only difference between this new model and the previous working one is that this one registered the ScatterND plugin during the conversion from onnx to tensorrt.

06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND
[06/21/2022-16:15:24] [I] [TRT] No importer registered for op: ScatterND. Attempting to import as plugin.
[06/21/2022-16:15:24] [I] [TRT] Searching for plugin: ScatterND, plugin_version: 1, plugin_namespace: 
[06/21/2022-16:15:24] [I] [TRT] Successfully created plugin: ScatterND

Due to this, I doubt it is the registered plugin that caused DeepStream to fail?

TensorRT engine converted with FP16 mode using

trtexec --onnx=yolov5s_b1_mod.onnx --workspace=1024 --fp16 --saveEngine=yolov5s_b1.engine

The environment and sample app are the same as before. Note: I marked this thread as unresolved for now.

Thanks

fanzh · June 21, 2022, 10:19am

1 where did you add that initLibNvInferPlugins?
2 here are some samples about using yolov5 in deepstream,
can you find any difference? GitHub - beyondli/Yolo_on_Jetson, Custom Yolov5 on Deepstream 6.0

peeranat85 · June 21, 2022, 11:25am

initLibNvInferPlugins(nullptr, ""); is the first command before everything. I tried with and without the command. it didn’t work either.
checked. In my case, my sample app is in C code based on deepstream_test1_app.c. The repo you provided didn’t implement the app this way. For the custom plugins, I didn’t use it. I enabled

network-type=100
output-tensor-meta=1

in the config want to parse the result manually based on the sample app, deepstream-infer-tensor-meta-test.

fanzh · June 22, 2022, 2:30am

please refer to this link, it is similar to your issue, Unexpected exception an illegal memory access was encountered - #3 by mchi

peeranat85 · June 22, 2022, 10:32am

Yes. That was the root cause of the issue. I tried to convert with inplace=False, illegal memory access error no longer appears.

system · July 6, 2022, 10:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DeepStream API error DeepStream SDK	31	637	July 17, 2023
Batch-size with 9 rtsp streams DeepStream SDK hw , cuda , gstreamer	16	1684	October 12, 2021
Runtime errors when running the human pose estimation application DeepStream SDK tensorrt , cuda , ubuntu	25	4801	October 12, 2021
ERROR: [TRT} stdArchiveReader ... Serialization assertion TAO Toolkit tensorrt	12	6532	September 25, 2022
Failed to parse ONNX model from file DeepStream SDK jetson , deepstream	5	41	January 29, 2025
Failed to used TensorRT Engine file in deepstream DeepStream SDK	16	2746	October 12, 2021
Run BACK-TO-BACK-DETECTORS REFERENCE APP under DeepStream SDK 5.0 DeepStream SDK	16	997	October 12, 2021
Serialized engine contains plugin, but no plugin factory was provided DeepStream SDK	11	1616	October 12, 2021
Cuda failure: status=801 Error(-1) in buffer allocation DeepStream SDK	36	1444	September 18, 2023
Migrated from DeepStream 4 to Deepstream 5 and got errors DeepStream SDK nvbugs	36	2388	October 12, 2021

Deepstream returns "Failed to synchronize on cuda copy-coplete-event, cuda err_no:700, err_str:cudaErrorIllegalAddress"

Related topics