Please provide complete information as applicable to your setup.
• Hardware Platform: GPU
• DeepStream Version 6.3
• TensorRT Version: 8.5.3.1
• NVIDIA GPU Driver Version 530.30.02
• Issue Type: questions
• How to reproduce the issue ? I presume it relates specifically to the model I’m running
All commands are run in the nvcr.io/nvidia/deepstream:6.3-gc-triton-devel
docker image. I can load my object detection inference engine with trtexec
fine, but when I try to run deepstream inference I get a segmentation fault.
Verbose trtexec
output:
/usr/src/tensorrt/bin/trtexec --loadEngine=/data/zoo/retinanettf/trt-export/TEST.engine --verbose
&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # /usr/src/tensorrt/bin/trtexec --loadEngine=/data/zoo/retinanettf/trt-export/TEST.engine --verbose
[11/08/2024-00:03:12] [I] === Model Options ===
[11/08/2024-00:03:12] [I] Format: *
[11/08/2024-00:03:12] [I] Model:
[11/08/2024-00:03:12] [I] Output:
[11/08/2024-00:03:12] [I] === Build Options ===
[11/08/2024-00:03:12] [I] Max batch: 1
[11/08/2024-00:03:12] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[11/08/2024-00:03:12] [I] minTiming: 1
[11/08/2024-00:03:12] [I] avgTiming: 8
[11/08/2024-00:03:12] [I] Precision: FP32
[11/08/2024-00:03:12] [I] LayerPrecisions:
[11/08/2024-00:03:12] [I] Calibration:
[11/08/2024-00:03:12] [I] Refit: Disabled
[11/08/2024-00:03:12] [I] Sparsity: Disabled
[11/08/2024-00:03:12] [I] Safe mode: Disabled
[11/08/2024-00:03:12] [I] DirectIO mode: Disabled
[11/08/2024-00:03:12] [I] Restricted mode: Disabled
[11/08/2024-00:03:12] [I] Build only: Disabled
[11/08/2024-00:03:12] [I] Save engine:
[11/08/2024-00:03:12] [I] Load engine: /data/zoo/retinanettf/trt-export/TEST.engine
[11/08/2024-00:03:12] [I] Profiling verbosity: 0
[11/08/2024-00:03:12] [I] Tactic sources: Using default tactic sources
[11/08/2024-00:03:12] [I] timingCacheMode: local
[11/08/2024-00:03:12] [I] timingCacheFile:
[11/08/2024-00:03:12] [I] Heuristic: Disabled
[11/08/2024-00:03:12] [I] Preview Features: Use default preview flags.
[11/08/2024-00:03:12] [I] Input(s)s format: fp32:CHW
[11/08/2024-00:03:12] [I] Output(s)s format: fp32:CHW
[11/08/2024-00:03:12] [I] Input build shapes: model
[11/08/2024-00:03:12] [I] Input calibration shapes: model
[11/08/2024-00:03:12] [I] === System Options ===
[11/08/2024-00:03:12] [I] Device: 0
[11/08/2024-00:03:12] [I] DLACore:
[11/08/2024-00:03:12] [I] Plugins:
[11/08/2024-00:03:12] [I] === Inference Options ===
[11/08/2024-00:03:12] [I] Batch: 1
[11/08/2024-00:03:12] [I] Input inference shapes: model
[11/08/2024-00:03:12] [I] Iterations: 10
[11/08/2024-00:03:12] [I] Duration: 3s (+ 200ms warm up)
[11/08/2024-00:03:12] [I] Sleep time: 0ms
[11/08/2024-00:03:12] [I] Idle time: 0ms
[11/08/2024-00:03:12] [I] Streams: 1
[11/08/2024-00:03:12] [I] ExposeDMA: Disabled
[11/08/2024-00:03:12] [I] Data transfers: Enabled
[11/08/2024-00:03:12] [I] Spin-wait: Disabled
[11/08/2024-00:03:12] [I] Multithreading: Disabled
[11/08/2024-00:03:12] [I] CUDA Graph: Disabled
[11/08/2024-00:03:12] [I] Separate profiling: Disabled
[11/08/2024-00:03:12] [I] Time Deserialize: Disabled
[11/08/2024-00:03:12] [I] Time Refit: Disabled
[11/08/2024-00:03:12] [I] NVTX verbosity: 0
[11/08/2024-00:03:12] [I] Persistent Cache Ratio: 0
[11/08/2024-00:03:12] [I] Inputs:
[11/08/2024-00:03:12] [I] === Reporting Options ===
[11/08/2024-00:03:12] [I] Verbose: Enabled
[11/08/2024-00:03:12] [I] Averages: 10 inferences
[11/08/2024-00:03:12] [I] Percentiles: 90,95,99
[11/08/2024-00:03:12] [I] Dump refittable layers:Disabled
[11/08/2024-00:03:12] [I] Dump output: Disabled
[11/08/2024-00:03:12] [I] Profile: Disabled
[11/08/2024-00:03:12] [I] Export timing to JSON file:
[11/08/2024-00:03:12] [I] Export output to JSON file:
[11/08/2024-00:03:12] [I] Export profile to JSON file:
[11/08/2024-00:03:12] [I]
[11/08/2024-00:03:12] [I] === Device Information ===
[11/08/2024-00:03:12] [I] Selected Device: NVIDIA TITAN V
[11/08/2024-00:03:12] [I] Compute Capability: 7.0
[11/08/2024-00:03:12] [I] SMs: 80
[11/08/2024-00:03:12] [I] Compute Clock Rate: 1.455 GHz
[11/08/2024-00:03:12] [I] Device Global Memory: 12054 MiB
[11/08/2024-00:03:12] [I] Shared Memory per SM: 96 KiB
[11/08/2024-00:03:12] [I] Memory Bus Width: 3072 bits (ECC disabled)
[11/08/2024-00:03:12] [I] Memory Clock Rate: 0.85 GHz
[11/08/2024-00:03:12] [I]
[11/08/2024-00:03:12] [I] TensorRT version: 8.5.3
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::CropAndResizeDynamic version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::DecodeBbox3DPlugin version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::fMHA_V2 version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::fMHCA version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GenerateDetection_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::GroupNorm version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 2
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::LayerNorm version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::NMSDynamic_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::PillarScatterPlugin version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ProposalDynamic version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Proposal version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ROIAlign_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::ScatterND version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::SeqLen2Spatial version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::SplitGeLU version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::Split version 1
[11/08/2024-00:03:12] [V] [TRT] Registered plugin creator - ::VoxelGeneratorPlugin version 1
[11/08/2024-00:03:12] [I] Engine loaded in 0.133536 sec.
[11/08/2024-00:03:12] [I] [TRT] Loaded engine size: 100 MiB
[11/08/2024-00:03:12] [V] [TRT] Trying to load shared library libcublas.so.11
[11/08/2024-00:03:12] [V] [TRT] Loaded shared library libcublas.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublas as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublasLt as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +12, now: CPU 193, GPU 3294 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +12, GPU +10, now: CPU 205, GPU 3304 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Deserialization required 84860 microseconds.
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +100, now: CPU 0, GPU 100 (MiB)
[11/08/2024-00:03:13] [I] Engine deserialized in 0.231975 sec.
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcublas.so.11
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcublas.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublas as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcublasLt.so.11
[11/08/2024-00:03:13] [V] [TRT] Using cublasLt as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 205, GPU 3296 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Trying to load shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Loaded shared library libcudnn.so.8
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as plugin tactic source
[11/08/2024-00:03:13] [V] [TRT] Using cuDNN as core library tactic source
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 206, GPU 3304 (MiB)
[11/08/2024-00:03:13] [V] [TRT] Total per-runner device persistent memory is 1375744
[11/08/2024-00:03:13] [V] [TRT] Total per-runner host persistent memory is 327376
[11/08/2024-00:03:13] [V] [TRT] Allocated activation device memory of size 123219968
[11/08/2024-00:03:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +118, now: CPU 0, GPU 218 (MiB)
[11/08/2024-00:03:13] [V] [TRT] CUDA lazy loading is enabled.
[11/08/2024-00:03:13] [I] Setting persistentCacheLimit to 0 bytes.
[11/08/2024-00:03:13] [V] Using enqueueV3.
[11/08/2024-00:03:13] [I] Using random values for input input_tensor
[11/08/2024-00:03:13] [I] Created input binding for input_tensor with dimensions 1x3x768x1408
[11/08/2024-00:03:13] [I] Using random values for output num_detections
[11/08/2024-00:03:13] [I] Created output binding for num_detections with dimensions 1x1
[11/08/2024-00:03:13] [I] Using random values for output detection_boxes
[11/08/2024-00:03:13] [I] Created output binding for detection_boxes with dimensions 1x150x4
[11/08/2024-00:03:13] [I] Using random values for output detection_scores
[11/08/2024-00:03:13] [I] Created output binding for detection_scores with dimensions 1x150
[11/08/2024-00:03:13] [I] Using random values for output detection_classes
[11/08/2024-00:03:13] [I] Created output binding for detection_classes with dimensions 1x150
[11/08/2024-00:03:13] [I] Starting inference
[11/08/2024-00:03:16] [I] Warmup completed 14 queries over 200 ms
[11/08/2024-00:03:16] [I] Timing trace has 289 queries over 3.03128 s
[11/08/2024-00:03:16] [I]
[11/08/2024-00:03:16] [I] === Trace details ===
[11/08/2024-00:03:16] [I] Trace averages of 10 runs:
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.7639 ms - Host latency: 11.8547 ms (enqueue 0.915877 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.5072 ms - Host latency: 11.6037 ms (enqueue 1.04777 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4655 ms - Host latency: 11.5605 ms (enqueue 0.951651 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4299 ms - Host latency: 11.5274 ms (enqueue 0.980035 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4331 ms - Host latency: 11.5289 ms (enqueue 0.962585 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4235 ms - Host latency: 11.5198 ms (enqueue 0.956152 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4288 ms - Host latency: 11.522 ms (enqueue 1.00239 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4354 ms - Host latency: 11.5282 ms (enqueue 1.01691 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.438 ms - Host latency: 11.5346 ms (enqueue 0.916431 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4346 ms - Host latency: 11.5305 ms (enqueue 1.00333 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4321 ms - Host latency: 11.5313 ms (enqueue 0.993127 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4336 ms - Host latency: 11.5278 ms (enqueue 0.963293 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4215 ms - Host latency: 11.5167 ms (enqueue 0.990271 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4167 ms - Host latency: 11.5135 ms (enqueue 0.948621 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4339 ms - Host latency: 11.5261 ms (enqueue 0.962927 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4258 ms - Host latency: 11.5187 ms (enqueue 0.9745 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4374 ms - Host latency: 11.5317 ms (enqueue 0.917029 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4373 ms - Host latency: 11.5313 ms (enqueue 0.991675 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4253 ms - Host latency: 11.5226 ms (enqueue 0.966064 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4482 ms - Host latency: 11.5409 ms (enqueue 0.957446 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4527 ms - Host latency: 11.5512 ms (enqueue 1.02346 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4389 ms - Host latency: 11.5378 ms (enqueue 0.991333 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.425 ms - Host latency: 11.5199 ms (enqueue 0.971851 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.42 ms - Host latency: 11.5173 ms (enqueue 1.03027 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4242 ms - Host latency: 11.515 ms (enqueue 0.94707 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.45 ms - Host latency: 11.5496 ms (enqueue 1.00337 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4302 ms - Host latency: 11.5287 ms (enqueue 1.01829 ms)
[11/08/2024-00:03:16] [I] Average on 10 runs - GPU latency: 10.4402 ms - Host latency: 11.5346 ms (enqueue 0.974512 ms)
[11/08/2024-00:03:16] [I]
[11/08/2024-00:03:16] [I] === Performance summary ===
[11/08/2024-00:03:16] [I] Throughput: 95.3393 qps
[11/08/2024-00:03:16] [I] Latency: min = 11.4592 ms, max = 12.4616 ms, mean = 11.5432 ms, median = 11.5381 ms, percentile(90%) = 11.5696 ms, percentile(95%) = 11.6161 ms, percentile(99%) = 12.2716 ms
[11/08/2024-00:03:16] [I] Enqueue Time: min = 0.533325 ms, max = 1.41605 ms, mean = 0.978633 ms, median = 0.95874 ms, percentile(90%) = 1.24854 ms, percentile(95%) = 1.29541 ms, percentile(99%) = 1.40796 ms
[11/08/2024-00:03:16] [I] H2D Latency: min = 1.06958 ms, max = 1.10913 ms, mean = 1.08468 ms, median = 1.08521 ms, percentile(90%) = 1.09314 ms, percentile(95%) = 1.09448 ms, percentile(99%) = 1.10001 ms
[11/08/2024-00:03:16] [I] GPU Compute Time: min = 10.368 ms, max = 11.3664 ms, mean = 10.4478 ms, median = 10.4438 ms, percentile(90%) = 10.4744 ms, percentile(95%) = 10.5114 ms, percentile(99%) = 11.1749 ms
[11/08/2024-00:03:16] [I] D2H Latency: min = 0.00830078 ms, max = 0.0131836 ms, mean = 0.0107084 ms, median = 0.0107422 ms, percentile(90%) = 0.0117188 ms, percentile(95%) = 0.012085 ms, percentile(99%) = 0.0124817 ms
[11/08/2024-00:03:16] [I] Total Host Walltime: 3.03128 s
[11/08/2024-00:03:16] [I] Total GPU Compute Time: 3.01943 s
[11/08/2024-00:03:16] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/08/2024-00:03:16] [V]
[11/08/2024-00:03:16] [V] === Explanations of the performance metrics ===
[11/08/2024-00:03:16] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[11/08/2024-00:03:16] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[11/08/2024-00:03:16] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[11/08/2024-00:03:16] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[11/08/2024-00:03:16] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[11/08/2024-00:03:16] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[11/08/2024-00:03:16] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[11/08/2024-00:03:16] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[11/08/2024-00:03:16] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # /usr/src/tensorrt/bin/trtexec --loadEngine=/data/zoo/retinanettf/trt-export/TEST.engine --verbose
Test pipeline that throws fault:
Starting program: /usr/bin/gst-launch-1.0 fakesrc \! nvv4l2decoder \! m.sink_0 nvstreammux name=m width=1408 height=768 batch-size=1 \! nvinfer config-file-path=/nets/pgie_configs/detection.txt model-engine-file=/data/zoo/retinanettf/trt-export/TEST.engine
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ff48b559700 (LWP 761)]
[New Thread 0x7ff48ad58700 (LWP 762)]
Setting pipeline to PAUSED ...
[New Thread 0x7ff488ec1700 (LWP 763)]
[New Thread 0x7ff481067700 (LWP 764)]
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT input_tensor 3x768x1408
1 OUTPUT kINT32 num_detections 1
2 OUTPUT kFLOAT detection_boxes 150x4
3 OUTPUT kFLOAT detection_scores 150
4 OUTPUT kINT32 detection_classes 150
[New Thread 0x7ff40bfff700 (LWP 765)]
[New Thread 0x7ff40b7fe700 (LWP 766)]
[New Thread 0x7ff40affd700 (LWP 767)]
[New Thread 0x7ff40a7fc700 (LWP 768)]
Thread 9 "fakesrc0:src" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff40a7fc700 (LWP 768)]
0x00007ff4991a3403 in ?? () from /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so
(gdb) backtrace
#0 0x00007ff4991a3403 in () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libgstnvvideo4linux2.so
#1 0x00007ff497449dab in () at /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0
#2 0x00007ff49744caf8 in () at /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0
#3 0x00007ff49744d1ea in () at /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0
#4 0x00007ff4998c4fef in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#5 0x00007ff4998c7051 in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#6 0x00007ff4998cde63 in gst_pad_push () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#7 0x00007ff49920e0a5 in () at /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0
#8 0x00007ff4998fc1e7 in () at /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0
#9 0x00007ff49973c384 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007ff49973bae1 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007ff4996a5609 in start_thread () at /usr/lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007ff4995ca133 in clone () at /usr/lib/x86_64-linux-gnu/libc.so.6
the detection.txt pgie config file contains the following:
[property]
gpu-id=0
net-scale-factor=0.02421
offsets=130.6875;136.7565;140.913
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=18
interval=0
gie-unique-id=1
is-classifier=0
output-blob-names=num_detections;detection_boxes;detection_scores;detection_classes
cluster-mode=2
I trained the model with the tensorflow object detection api, exported it to onnx with this which I called with --input_format NCHW
flag. That onnx model was converted to TRT via /usr/src/tensorrt/bin/trtexec
inside the nvcr.io/nvidia/deepstream:6.3-gc-triton-devel
docker image, so it seems like it can’t be a versioning issue.
Any insight would be very much appreciated, thanks