SIGSEGV thrown during object detection inference

Please provide complete information as applicable to your setup.

• Hardware Platform: GPU
• DeepStream Version 6.3
• TensorRT Version: 8.5.3.1
• NVIDIA GPU Driver Version 530.30.02
• Issue Type: questions
• How to reproduce the issue ? I presume it relates specifically to the model I’m running

All commands are run in the nvcr.io/nvidia/deepstream:6.3-gc-triton-devel docker image. I can load my object detection inference engine with trtexec fine, but when I try to run deepstream inference I get a segmentation fault.

Verbose trtexec output:

/usr/src/tensorrt/bin/trtexec --loadEngine=/data/zoo/retinanettf/trt-export/TEST.engine --verbose
&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # /usr/src/tensorrt/bin/trtexec --loadEngine=/data/zoo/retinanettf/trt-export/TEST.engine --verbose
[11/08/2024-00:03:12] [I] === Model Options ===
[11/08/2024-00:03:12] [I] Format: *
[11/08/2024-00:03:12] [I] Model: 
[11/08/2024-00:03:12] [I] Output:
[11/08/2024-00:03:12] [I] === Build Options ===
[11/08/2024-00:03:12] [I] Max batch: 1
[11/08/2024-00:03:12] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/08/2024-00:03:12] [I] minTiming: 1
[11/08/2024-00:03:12] [I] avgTiming: 8
[11/08/2024-00:03:12] [I] Precision: FP32
[11/08/2024-00:03:12] [I] LayerPrecisions: 
[11/08/2024-00:03:12] [I] Calibration: 
[11/08/2024-00:03:12] [I] Refit: Disabled
[11/08/2024-00:03:12] [I] Sparsity: Disabled
[11/08/2024-00:03:12] [I] Safe mode: Disabled
[11/08/2024-00:03:12] [I] DirectIO mode: Disabled
[11/08/2024-00:03:12] [I] Restricted mode: Disabled
[11/08/2024-00:03:12] [I] Build only: Disabled
[11/08/2024-00:03:12] [I] Save engine: 
[11/08/2024-00:03:12] [I] Load engine: /data/zoo/retinanettf/trt-export/TEST.engine
[11/08/2024-00:03:12] [I] Profiling verbosity: 0
[11/08/2024-00:03:12] [I] Tactic sources: Using default tactic sources
[11/08/2024-00:03:12] [I] timingCacheMode: local
[11/08/2024-00:03:12] [I] timingCacheFile: 
[11/08/2024-00:03:12] [I] Heuristic: Disabled
[11/08/2024-00:03:12] [I] Preview Features: Use default preview flags.
[11/08/2024-00:03:12] [I] Input(s)s format: fp32:CHW
[11/08/2024-00:03:12] [I] Output(s)s format: fp32:CHW
[11/08/2024-00:03:12] [I] Input build shapes: model
[11/08/2024-00:03:12] [I] Input calibration shapes: model
[11/08/2024-00:03:12] [I] === System Options ===
[11/08/2024-00:03:12] [I] Device: 0
[11/08/2024-00:03:12] [I] DLACore: 
[11/08/2024-00:03:12] [I] Plugins:
[11/08/2024-00:03:12] [I] === Inference Options ===
[11/08/2024-00:03:12] [I] Batch: 1
[11/08/2024-00:03:12] [I] Input inference shapes: model
[11/08/2024-00:03:12] [I] Iterations: 10
[11/08/2024-00:03:12] [I] Duration: 3s (+ 200ms warm up)
[11/08/2024-00:03:12] [I] Sleep time: 0ms
[11/08/2024-00:03:12] [I] Idle time: 0ms
[11/08/2024-00:03:12] [I] Streams: 1
[11/08/2024-00:03:12] [I] ExposeDMA: Disabled
[11/08/2024-00:03:12] [I] Data transfers: Enabled
[11/08/2024-00:03:12] [I] Spin-wait: Disabled
[11/08/2024-00:03:12] [I] Multithreading: Disabled
[11/08/2024-00:03:12] [I] CUDA Graph: Disabled
[11/08/2024-00:03:12] [I] Separate profiling: Disabled
[11/08/2024-00:03:12] [I] Time Deserialize: Disabled
[11/08/2024-00:03:12] [I] Time Refit: Disabled
[11/08/2024-00:03:12] [I] NVTX verbosity: 0
[11/08/2024-00:03:12] [I] Persistent Cache Ratio: 0
[11/08/2024-00:03:12] [I] Inputs:
[11/08/2024-00:03:12] [I] === Reporting Options ===
[11/08/2024-00:03:12] [I] Verbose: Enabled
[11/08/2024-00:03:12] [I] Averages: 10 inferences
[11/08/2024-00:03:12] [I] Percentiles: 90,95,99
[11/08/2024-00:03:12] [I] Dump refittable layers:Disabled
[11/08/2024-00:03:12] [I] Dump output: Disabled
[11/08/2024-00:03:12] [I] Profile: Disabled
[11/08/2024-00:03:12] [I] Export timing to JSON file: 
[11/08/2024-00:03:12] [I] Export output to JSON file: 
[11/08/2024-00:03:12] [I] Export profile to JSON file: 
[11/08/2024-00:03:12] [I] 
[11/08/2024-00:03:12] [I] === Device Information ===
[11/08/2024-00:03:12] [I] Selected Device: NVIDIA TITAN V
[11/08/2024-00:03:12] [I] Compute Capability: 7.0
[11/08/2024-00:03:12] [I] SMs: 80
[11/08/2024-00:03:12] [I] Compute Clock Rate: 1.455 GHz
[11/08/2024-00:03:12] [I] Device Global Memory: 12054 MiB
[11/08/2024-00:03:12] [I] Shared Memory per SM: 96 KiB
[11/08/2024-00:03:12] [I] Memory Bus Width: 3072 bits (ECC disabled)
[11/08/2024-00:03:12] [I] Memory Clock Rate: 0.85 GHz
[11/08/2024-00:03:12] [I] 
[11/08/2024-00:03:12] [I] TensorRT version: 8.5.3
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::CropAndResizeDynamic version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::DecodeBbox3DPlugin version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::fMHA_V2 version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::fMHCA version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GenerateDetection_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GroupNorm version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 2
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::LayerNorm version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::NMSDynamic_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::PillarScatterPlugin version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ProposalDynamic version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Proposal version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ROIAlign_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ScatterND version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::SeqLen2Spatial version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::SplitGeLU version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Split version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::VoxelGeneratorPlugin version 1
[11/08/2024-00:03:12] [I] Engine loaded in 0.133536 sec.
[11/08/2024-00:03:12] [I] [TRT] Loaded engine size: 100 MiB
[11/08/2024-00:03:12] [V] [TRT] Trying to load shared library libcublas.so.11
[11/08/2024-00:03:12] [V] [TRT] Loaded shared library libcublas.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublas as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublasLt as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +12, now: CPU 193, GPU 3294 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +12, GPU +10, now: CPU 205, GPU 3304 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Deserialization required 84860 microseconds.
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +100, now: CPU 0, GPU 100 (MiB)
[11/08/2024-00:03:13] [I] Engine deserialized in 0.231975 sec.
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcublas.so.11
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcublas.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublas as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublasLt as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 205, GPU 3296 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 206, GPU 3304 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Total per-runner device persistent memory is 1375744
[11/08/2024-00:03:13] [V] [TRT] Total per-runner host persistent memory is 327376
[11/08/2024-00:03:13] [V] [TRT] Allocated activation device memory of size 123219968
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +118, now: CPU 0, GPU 218 (MiB)
[11/08/2024-00:03:13] [V] [TRT] CUDA lazy loading is enabled.
[11/08/2024-00:03:13] [I] Setting persistentCacheLimit to 0 bytes.
[11/08/2024-00:03:13] [V] Using enqueueV3.
[11/08/2024-00:03:13] [I] Using random values for input input_tensor
[11/08/2024-00:03:13] [I] Created input binding for input_tensor with dimensions 1x3x768x1408
[11/08/2024-00:03:13] [I] Using random values for output num_detections
[11/08/2024-00:03:13] [I] Created output binding for num_detections with dimensions 1x1
[11/08/2024-00:03:13] [I] Using random values for output detection_boxes
[11/08/2024-00:03:13] [I] Created output binding for detection_boxes with dimensions 1x150x4
[11/08/2024-00:03:13] [I] Using random values for output detection_scores
[11/08/2024-00:03:13] [I] Created output binding for detection_scores with dimensions 1x150
[11/08/2024-00:03:13] [I] Using random values for output detection_classes
[11/08/2024-00:03:13] [I] Created output binding for detection_classes with dimensions 1x150
[11/08/2024-00:03:13] [I] Starting inference
[11/08/2024-00:03:16] [I] Warmup completed 14 queries over 200 ms
[11/08/2024-00:03:16] [I] Timing trace has 289 queries over 3.03128 s
[11/08/2024-00:03:16] [I] 
[11/08/2024-00:03:16] [I] === Trace details ===
[11/08/2024-00:03:16] [I] Trace averages of 10 runs:
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.7639 ms - Host latency: 11.8547 ms (enqueue 0.915877 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.5072 ms - Host latency: 11.6037 ms (enqueue 1.04777 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4655 ms - Host latency: 11.5605 ms (enqueue 0.951651 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4299 ms - Host latency: 11.5274 ms (enqueue 0.980035 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4331 ms - Host latency: 11.5289 ms (enqueue 0.962585 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4235 ms - Host latency: 11.5198 ms (enqueue 0.956152 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4288 ms - Host latency: 11.522 ms (enqueue 1.00239 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4354 ms - Host latency: 11.5282 ms (enqueue 1.01691 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.438 ms - Host latency: 11.5346 ms (enqueue 0.916431 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4346 ms - Host latency: 11.5305 ms (enqueue 1.00333 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4321 ms - Host latency: 11.5313 ms (enqueue 0.993127 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4336 ms - Host latency: 11.5278 ms (enqueue 0.963293 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4215 ms - Host latency: 11.5167 ms (enqueue 0.990271 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4167 ms - Host latency: 11.5135 ms (enqueue 0.948621 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4339 ms - Host latency: 11.5261 ms (enqueue 0.962927 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4258 ms - Host latency: 11.5187 ms (enqueue 0.9745 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4374 ms - Host latency: 11.5317 ms (enqueue 0.917029 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4373 ms - Host latency: 11.5313 ms (enqueue 0.991675 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4253 ms - Host latency: 11.5226 ms (enqueue 0.966064 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4482 ms - Host latency: 11.5409 ms (enqueue 0.957446 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4527 ms - Host latency: 11.5512 ms (enqueue 1.02346 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4389 ms - Host latency: 11.5378 ms (enqueue 0.991333 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.425 ms - Host latency: 11.5199 ms (enqueue 0.971851 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.42 ms - Host latency: 11.5173 ms (enqueue 1.03027 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4242 ms - Host latency: 11.515 ms (enqueue 0.94707 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.45 ms - Host latency: 11.5496 ms (enqueue 1.00337 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4302 ms - Host latency: 11.5287 ms (enqueue 1.01829 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4402 ms - Host latency: 11.5346 ms (enqueue 0.974512 ms)
[11/08/2024-00:03:16] [I] 
[11/08/2024-00:03:16] [I] === Performance summary ===
[11/08/2024-00:03:16] [I] Throughput: 95.3393 qps
[11/08/2024-00:03:16] [I] Latency: min = 11.4592 ms, max = 12.4616 ms, mean = 11.5432 ms, median = 11.5381 ms, percentile(90%) = 11.5696 ms, percentile(95%) = 11.6161 ms, percentile(99%) = 12.2716 ms
[11/08/2024-00:03:16] [I] Enqueue Time: min = 0.533325 ms, max = 1.41605 ms, mean = 0.978633 ms, median = 0.95874 ms, percentile(90%) = 1.24854 ms, percentile(95%) = 1.29541 ms, percentile(99%) = 1.40796 ms
[11/08/2024-00:03:16] [I] H2D Latency: min = 1.06958 ms, max = 1.10913 ms, mean = 1.08468 ms, median = 1.08521 ms, percentile(90%) = 1.09314 ms, percentile(95%) = 1.09448 ms, percentile(99%) = 1.10001 ms
[11/08/2024-00:03:16] [I] GPU Compute Time: min = 10.368 ms, max = 11.3664 ms, mean = 10.4478 ms, median = 10.4438 ms, percentile(90%) = 10.4744 ms, percentile(95%) = 10.5114 ms, percentile(99%) = 11.1749 ms
[11/08/2024-00:03:16] [I] D2H Latency: min = 0.00830078 ms, max = 0.0131836 ms, mean = 0.0107084 ms, median = 0.0107422 ms, percentile(90%) = 0.0117188 ms, percentile(95%) = 0.012085 ms, percentile(99%) = 0.0124817 ms
[11/08/2024-00:03:16] [I] Total Host Walltime: 3.03128 s
[11/08/2024-00:03:16] [I] Total GPU Compute Time: 3.01943 s
[11/08/2024-00:03:16] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/08/2024-00:03:16] [V] 
[11/08/2024-00:03:16] [V] === Explanations of the performance metrics ===
[11/08/2024-00:03:16] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[11/08/2024-00:03:16] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[11/08/2024-00:03:16] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[11/08/2024-00:03:16] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[11/08/2024-00:03:16] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[11/08/2024-00:03:16] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[11/08/2024-00:03:16] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[11/08/2024-00:03:16] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[11/08/2024-00:03:16] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # /usr/src/tensorrt/bin/trtexec --loadEngine=/data/zoo/retinanettf/trt-export/TEST.engine --verbose

Test pipeline that throws fault:

Starting program: /usr/bin/gst-launch-1.0 fakesrc \! nvv4l2decoder \! m.sink_0 nvstreammux name=m width=1408 height=768 batch-size=1 \! nvinfer config-file-path=/nets/pgie_configs/detection.txt model-engine-file=/data/zoo/retinanettf/trt-export/TEST.engine
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ff48b559700 (LWP 761)]
[New Thread 0x7ff48ad58700 (LWP 762)]
Setting pipeline to PAUSED ...
[New Thread 0x7ff488ec1700 (LWP 763)]
[New Thread 0x7ff481067700 (LWP 764)]
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0   INPUT  kFLOAT input_tensor    3x768x1408      
1   OUTPUT kINT32 num_detections  1               
2   OUTPUT kFLOAT detection_boxes 150x4           
3   OUTPUT kFLOAT detection_scores 150             
4   OUTPUT kINT32 detection_classes 150             

[New Thread 0x7ff40bfff700 (LWP 765)]
[New Thread 0x7ff40b7fe700 (LWP 766)]
[New Thread 0x7ff40affd700 (LWP 767)]
[New Thread 0x7ff40a7fc700 (LWP 768)]

Thread 9 "fakesrc0:src" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff40a7fc700 (LWP 768)]
0x00007ff4991a3403 in ?? () from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so
(gdb) backtrace
#0  0x00007ff4991a3403 in  () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so
#1  0x00007ff497449dab in  () at /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0
#2  0x00007ff49744caf8 in  () at /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0
#3  0x00007ff49744d1ea in  () at /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0
#4  0x00007ff4998c4fef in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#5  0x00007ff4998c7051 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#6  0x00007ff4998cde63 in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7  0x00007ff49920e0a5 in  () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#8  0x00007ff4998fc1e7 in  () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#9  0x00007ff49973c384 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007ff49973bae1 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007ff4996a5609 in start_thread () at /usr/lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007ff4995ca133 in clone () at /usr/lib/x86_64-linux-gnu/libc.so.6

the detection.txt pgie config file contains the following:

[property]
gpu-id=0
net-scale-factor=0.02421
offsets=130.6875;136.7565;140.913
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=18
interval=0
gie-unique-id=1
is-classifier=0
output-blob-names=num_detections;detection_boxes;detection_scores;detection_classes
cluster-mode=2

I trained the model with the tensorflow object detection api, exported it to onnx with this which I called with --input_format NCHW flag. That onnx model was converted to TRT via /usr/src/tensorrt/bin/trtexec inside the nvcr.io/nvidia/deepstream:6.3-gc-triton-devel docker image, so it seems like it can’t be a versioning issue.

Any insight would be very much appreciated, thanks

There may be some problems with your pipeline. Could you try the pipeline below?

gst-launch-1.0 \
videotestsrc ! "video/x-raw,width=320,height=240,framerate=30/1" ! nvvideoconvert ! "video/x-raw(memory:NVMM)" ! mux.sink_0 \
nvstreammux name=mux batch-size=1 width=320 height=240 ! nvinfer config-file-path=/nets/pgie_configs/detection.txt model-engine-file=/data/zoo/retinanettf/trt-export/TEST.engine ! \
fakesink

hi @yuweiw, here’s the stack trace:

Starting program: /usr/bin/gst-launch-1.0 videotestsrc \! video/x-raw,width=320,height=240,framerate=30/1 \! nvvideoconvert \! video/x-raw\(memory:NVMM\) \! mux.sink_0 nvstreammux name=mux batch-size=1 width=320 height=240 \! nvinfer config-file-path=/nets/pgie_configs/detection.txt model-engine-file=/data/zoo/retinanettf/trt-export/TEST.engine \! fakesink
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f11718b2700 (LWP 150)]
[New Thread 0x7f11710b1700 (LWP 151)]
Setting pipeline to PAUSED ...
[New Thread 0x7f110cef0700 (LWP 152)]
[New Thread 0x7f1105fff700 (LWP 153)]
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0   INPUT  kFLOAT input_tensor    3x768x1408      
1   OUTPUT kINT32 num_detections  1               
2   OUTPUT kFLOAT detection_boxes 150x4           
3   OUTPUT kFLOAT detection_scores 150             
4   OUTPUT kINT32 detection_classes 150             

[New Thread 0x7f11057fe700 (LWP 154)]
[New Thread 0x7f1104ffd700 (LWP 155)]
[New Thread 0x7f10f1fff700 (LWP 156)]
[New Thread 0x7f10f17fe700 (LWP 157)]
Pipeline is PREROLLING ...
[New Thread 0x7f10f0ffd700 (LWP 158)]
[New Thread 0x7f10ebfff700 (LWP 159)]
[New Thread 0x7f10eb7fe700 (LWP 160)]

Thread 6 "gst-launch-1.0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f11057fe700 (LWP 154)]
0x00007f11704e8623 in attach_metadata_detector(_GstNvInfer*, _GstMiniObject*, GstNvInferFrame&, NvDsInferDetectionOutput&, float) ()
   from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
(gdb) backtrace
#0  0x00007f11704e8623 in attach_metadata_detector(_GstNvInfer*, _GstMiniObject*, GstNvInferFrame&, NvDsInferDetectionOutput&, float) ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#1  0x00007f11704d8500 in  () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so
#2  0x00007f117f8acae1 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007f117f816609 in start_thread () at /usr/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f117f73b133 in clone () at /usr/lib/x86_64-linux-gnu/libc.so.6

perhaps an issue with the EfficientNMS_TRT plugin parsing the network outputs? thanks

This part is open source. Could you add some log to address where the crash happened?

add you log info in the "attach_metadata_detector" API in the "sources\gst-plugins\gst-nvinfer\gstnvinfer_meta_utils.cpp" file
$cd sources\gst-plugins\gst-nvinfer\
add you log info in the "attach_metadata_detector" API in the "gstnvinfer_meta_utils.cpp" file
$make
$make install
$<run your command>

Thanks for the direction - I added a couple of lines of output to gstnvinfer_meta_utils.cpp, here is just the first part of that file that I edited:

/**                                                                                                                                     
 * Attach metadata for the detector. We will be adding a new metadata.                                                                  
 */
void
attach_metadata_detector (GstNvInfer * nvinfer, GstMiniObject * tensor_out_object,
    GstNvInferFrame & frame, NvDsInferDetectionOutput & detection_output, float segmentationThreshold)
{
  static gchar font_name[] = "Serif";
  NvDsObjectMeta *obj_meta = NULL;
  NvDsObjectMeta *parent_obj_meta = frame.obj_meta; /* This will be  NULL in case of primary detector */
  NvDsFrameMeta *frame_meta = frame.frame_meta;
  NvDsBatchMeta *batch_meta = frame_meta->base_meta.batch_meta;
  nvds_acquire_meta_lock (batch_meta);
  std::cout << "***** got meta lock" << std::endl;

  frame_meta->bInferDone = TRUE;
  /* Iterate through the inference output for one frame and attach the detected                                                         
   * bnounding boxes. */
  std::cout<<"***** n_objects:"<<detection_output.numObjects<<std::endl;
  int i = 0;
  NvDsInferObject & obj = detection_output.objects[i];
  std::cout<<"***** obj.left:"<<obj.left<<std::endl;

  for (guint i = 0; i < detection_output.numObjects; i++) {
    NvDsInferObject & obj = detection_output.objects[i];
    GstNvInferDetectionFilterParams & filter_params =
        (*nvinfer->perClassDetectionFilterParams)[obj.classIndex];
    std::cout<<"class index:"<<obj.classIndex<<std::endl;
    /* Scale the bounding boxes proportionally based on how the object/frame was                                                        
     * scaled during input. */

You can see the comments preceded by *****. Here’s the gstreamer output:

Starting program: /usr/bin/gst-launch-1.0 videotestsrc \! video/x-raw,width=320,height=240,framerate=30/1 \! nvvideoconvert \! video/x-raw\(memory:NVMM\) \! mux.sink_0 nvstreammux name=mux batch-size=1 width=320 height=240 \! nvinfer config-file-path=/slcs-deepstream/slcs/nets/pgie_configs/detection.txt model-engine-file=/data/zoo/retinanettf/trt-export/TEST.engine \! fakesink
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 566]
[New Thread 0x7f122cc14700 (LWP 569)]
[New Thread 0x7f1227fff700 (LWP 570)]
Setting pipeline to PAUSED ...
[New Thread 0x7f11c8410700 (LWP 571)]
[New Thread 0x7f11c77ea700 (LWP 572)]
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0   INPUT  kFLOAT input_tensor    3x768x1408      
1   OUTPUT kINT32 num_detections  1               
2   OUTPUT kFLOAT detection_boxes 150x4           
3   OUTPUT kFLOAT detection_scores 150             
4   OUTPUT kINT32 detection_classes 150             

[New Thread 0x7f11c6c69700 (LWP 573)]
[New Thread 0x7f11c6468700 (LWP 574)]
[New Thread 0x7f11c5c67700 (LWP 575)]
[New Thread 0x7f11c5466700 (LWP 576)]
Pipeline is PREROLLING ...
[New Thread 0x7f11c4c65700 (LWP 577)]
[New Thread 0x7f119673b700 (LWP 578)]
[New Thread 0x7f11697ae700 (LWP 579)]
***** got meta lock
***** n_objects:1814918261

Thread 6 "gst-launch-1.0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f11c6c69700 (LWP 573)]
0x00007f12277be3c3 in attach_metadata_detector(_GstNvInfer*, _GstMiniObject*, GstNvInferFrame&, NvDsInferDetectionOutput&, float) ()
   from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so

It looks like the seg fault is caused by NvDsInferObject & obj = detection_output.objects[i]; as the function crashes after printing ***** n_objects:1814918261 and doesn’t enter the loop. However the number of detected objects does seem very high… Appreciate any additional insight, thanks

Just from this log, there may be no objects detected. The detection_output.objects may be NULL. You can check if there is any problem with your model and config file first.

I can successfully run inference on the engine file with the TensorRT python api here.

I’ve tried changing a handful of different configuration file parameters but I’m still getting a segmentation fault. Even if there are no objects detected, detection_output.objects shouldn’t be NULL should it?

Since this is your own model, you need to do postprocess parsing for that. We have a lot of demos to show how to customize this.
Just setting the parse-bbox-func-name & custom-lib-path and parsing the tensor output by NvDsInferParseCustomNMSTLT.

thanks, the model uses the EfficientNMS_TRT bbox parser plugin, I assumed that is automatically detected and loaded during inference.

Here is part of the log when compiling the engine from onnx:

[11/12/2024-15:05:47] [TRT] [I] No importer registered for op: EfficientNMS_TRT. Attempting to import as plugin.
[11/12/2024-15:05:47] [TRT] [I] Searching for plugin: EfficientNMS_TRT, plugin_version: 1, plugin_namespace: 
[11/12/2024-15:05:47] [TRT] [I] Successfully created plugin: EfficientNMS_TRT
INFO:EngineBuilder:Network Description
INFO:EngineBuilder:Input 'input_tensor' with shape (1, 3, 768, 1408) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'num_detections' with shape (1, 1) and dtype DataType.INT32
INFO:EngineBuilder:Output 'detection_boxes' with shape (1, 150, 4) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_scores' with shape (1, 150) and dtype DataType.FLOAT
INFO:EngineBuilder:Output 'detection_classes' with shape (1, 150) and dtype DataType.INT32

It also looks like the input and output bindings are already registered when the model is loaded in deepstream:

WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0   INPUT  kFLOAT input_tensor    3x768x1408      
1   OUTPUT kINT32 num_detections  1               
2   OUTPUT kFLOAT detection_boxes 150x4           
3   OUTPUT kFLOAT detection_scores 150             
4   OUTPUT kINT32 detection_classes 150             

so I am a bit confused what I need to set in my detection pgie config - setting parse-bbox-func-name=EfficientNMS_TRT has no affect, and I assume the plugins are contained in /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so

Could you add some log to the postProcessHost in the sources\libs\nvdsinfer\nvdsinfer_context_impl.cpp to check if nvinfer supports your model?
If the parsing of the output tensor is not correct, you need to add your own custom parsing.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.