Details of Setup
- Hardware Platform (Jetson / GPU): GPU
- DeepStream Version: 7.1.0
- TensorRT Version: 10.7.0.23-1+cuda12.6 amd64
- NVIDIA GPU Driver Version (valid for GPU only): 560.35.03 (CUDA Version: 12.6)
Environment:
- DeepStream 7.1
- YOLOv11 custom parser compiled for NVIDIA TensorRT
- Ubuntu 22.04 LTS, Dockerized deployment
Issue Type: Bug
Description of Issue:
We’re experiencing a recurring segmentation fault (signal 11
) in a DeepStream 7.1 pipeline using a custom YOLOv11 parser (compiled for TensorRT). The crash occurs during MQTT message publishing and appears rooted in the custom inference library.
Error Log Snippet:
2025-02-04T10:49:18.9161781Z ds-tracker-1 | Publish callback with reason code: Success.
2025-02-04T10:49:18.9561721Z ds-tracker-1 | [mosq_mqtt_log_callback] Client null sending PUBLISH (d0, q0, r0, m30638, 'APP_output', ... (1502 bytes))
2025-02-04T10:49:18.9562051Z ds-tracker-1 | Publish callback with reason code: Success.
2025-02-04T10:49:18.9786314Z ds-tracker-1 | [b9e1e42f68d5:21 :0:286] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x55f4434b8700)
2025-02-04T10:49:18.9961495Z ds-tracker-1 | [mosq_mqtt_log_callback] Client null sending PUBLISH (d0, q0, r0, m30639, 'APP_output', ... (1502 bytes))
2025-02-04T10:49:18.9961844Z ds-tracker-1 | Publish callback with reason code: Success.
2025-02-04T10:49:19.0189169Z ds-tracker-1 | ==== backtrace (tid: 286) ====
2025-02-04T10:49:19.0190037Z ds-tracker-1 | 0 /usr/lib/x86_64-linux-gnu/libucs.so.0(ucs_handle_error+0x2e4) [0x7f82877c6fc4]
2025-02-04T10:49:19.0190336Z ds-tracker-1 | 1 /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x24fec) [0x7f82877cafec]
2025-02-04T10:49:19.0190725Z ds-tracker-1 | 2 /usr/lib/x86_64-linux-gnu/libucs.so.0(+0x251aa) [0x7f82877cb1aa]
2025-02-04T10:49:19.0191420Z ds-tracker-1 | 3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f8310844520]
2025-02-04T10:49:19.0191838Z ds-tracker-1 | 4 /home/ds_tracker/custom_libs/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so(+0x62265) [0x7f8249475265]
2025-02-04T10:49:19.0192413Z ds-tracker-1 | 5 /home/ds_tracker/custom_libs/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so(+0x62469) [0x7f8249475469]
2025-02-04T10:49:19.0192951Z ds-tracker-1 | 6 /home/ds_tracker/custom_libs/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so(NvDsInferParseYolo+0x34) [0x7f824947557b]
2025-02-04T10:49:19.0193740Z ds-tracker-1 | 7 /opt/nvidia/deepstream/deepstream/lib/libnvds_infer.so(_ZN9nvdsinfer19DetectPostprocessor19fillDetectionOutputERKSt6vectorI18NvDsInferLayerInfoSaIS2_EER24NvDsInferDetectionOutput+0xac) [0x7f828f987b6c]
2025-02-04T10:49:19.0194582Z ds-tracker-1 | 8 /opt/nvidia/deepstream/deepstream/lib/libnvds_infer.so(_ZN9nvdsinfer19DetectPostprocessor14parseEachBatchERKSt6vectorI18NvDsInferLayerInfoSaIS2_EER20NvDsInferFrameOutput+0x17) [0x7f828f964207]
2025-02-04T10:49:19.0195366Z ds-tracker-1 | 9 /opt/nvidia/deepstream/deepstream/lib/libnvds_infer.so(_ZN9nvdsinfer18InferPostprocessor15postProcessHostERNS_14NvDsInferBatchER27NvDsInferContextBatchOutput+0x76a) [0x7f828f96b72a]
2025-02-04T10:49:19.0196094Z ds-tracker-1 | 10 /opt/nvidia/deepstream/deepstream/lib/libnvds_infer.so(_ZN9nvdsinfer20NvDsInferContextImpl18dequeueOutputBatchER27NvDsInferContextBatchOutput+0x108) [0x7f828f9663a8]
2025-02-04T10:49:19.0196705Z ds-tracker-1 | 11 /opt/nvidia/deepstream/deepstream/lib/gst-plugins/libnvdsgst_infer.so(+0x1bd0d) [0x7f82904dfd0d]
2025-02-04T10:49:19.0197027Z ds-tracker-1 | 12 /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x89ac1) [0x7f830ffb0ac1]
2025-02-04T10:49:19.0197279Z ds-tracker-1 | 13 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f8310896ac3]
2025-02-04T10:49:19.0197560Z ds-tracker-1 | 14 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7f8310928850]
2025-02-04T10:49:19.0197759Z ds-tracker-1 | =================================
2025-02-04T10:49:32.4728772Z mqtt-broker-1 | 1738666172: Client auto-4B5A4D3B-FD27-C46B-F6DF-4A29B535CB62 closed its connection.
2025-02-04T10:49:32.5227365Z ds-tracker-1 | /opt/nvidia/deepstream/deepstream-7.1/entrypoint.sh: line 15: 21 Segmentation fault (core dumped) /opt/nvidia/nvidia_entrypoint.sh $@
2025-02-04T10:49:33.7214370Z Aborting on container exit...
2025-02-04T10:49:33.7214740Z
2025-02-04T10:49:33.7215944Z [Kds-tracker-1 exited with code 139
2025-02-04T10:49:33.7435201Z Container vgtr3-ds-tracker-1 Stopping
2025-02-04T10:49:33.7528593Z Container vgtr3-ds-tracker-1 Stopped
2025-02-04T10:49:33.7528861Z Container vgtr3-redis-1 Stopping
2025-02-04T10:49:33.7529020Z Container vgtr3-mqtt-broker-1 Stopping
2025-02-04T10:49:35.5677508Z Container vgtr3-mqtt-broker-1 Stopped
2025-02-04T10:49:35.9499743Z Container vgtr3-redis-1 Stopped
2025-02-04T10:49:36.1906090Z ##[error]Process completed with exit code 139.
Observations:
- The fault occurs after multiple successful MQTT publishes, suggesting memory corruption during bounding box parsing.
- Backtrace implicates
NvDsInferParseYolo
, specifically at offset0x62265
in the custom library.
Requested Guidance:
- Known issues with YOLOv11 output layer configurations in DeepStream 7.1?
- Best practices for debugging segmentation faults in custom parsers (e.g., heap overflow checks, CUDA-GDB integration).
Additional Note:
- We’ve validated the model’s ONNX conversion and TensorRT engine creation (no errors).