DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet | NVIDIA NGC)

atharv.sharma · February 7, 2025, 5:33pm

Hello everyone, I’m working with NVIDIA’s PointPillarnet deployable model (available on NGC) and encountered an issue when converting the provided ONNX model to a TensorRT engine.

Issue Description:

I used the provided ONNX model and converted it to a TensorRT engine using the following command:

/usr/src/tensorrt/bin/trtexec --onnx=pointpillars.onnx --saveEngine=pointpillars.trt --fp16 --minShapes=points:1x204800x4,num_points:1 --optShapes=points:1x204800x4,num_points:1 --maxShapes=points:1x204800x4,num_points:1

The conversion process completed successfully (see attached logs). However, when I try to load the resulting pointpillars.trt model in DriveWorks, I get the error:
DW_DNN_INVALID_MODEL.
Additionally, attempting to inspect or view the model layers for the generated .trt file results in the error:
“Invalid file content. File contains undocumented TensorRT engine data.”

Logs:

Below is an excerpt from the conversion log showing that the engine was built without errors:

&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=pointpillars.onnx --saveEngine=pointpillars.trt --fp16 --minShapes=points:1x204800x4,num_points:1 --optShapes=points:1x204800x4,num_points:1 --maxShapes=points:1x204800x4,num_points:1
[02/07/2025-16:47:12] [I] === Model Options ===
[02/07/2025-16:47:12] [I] Format: ONNX
[02/07/2025-16:47:12] [I] Model: pointpillars.onnx
[02/07/2025-16:47:12] [I] Output:
[02/07/2025-16:47:12] [I] === Build Options ===
[02/07/2025-16:47:12] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[02/07/2025-16:47:12] [I] avgTiming: 8
[02/07/2025-16:47:12] [I] Precision: FP32+FP16
[02/07/2025-16:47:12] [I] LayerPrecisions: 
[02/07/2025-16:47:12] [I] Layer Device Types: 
[02/07/2025-16:47:12] [I] Calibration: 
[02/07/2025-16:47:12] [I] Refit: Disabled
[02/07/2025-16:47:12] [I] Strip weights: Disabled
[02/07/2025-16:47:12] [I] Version Compatible: Disabled
[02/07/2025-16:47:12] [I] ONNX Plugin InstanceNorm: Disabled
[02/07/2025-16:47:12] [I] TensorRT runtime: full
[02/07/2025-16:47:12] [I] Lean DLL Path: 
[02/07/2025-16:47:12] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[02/07/2025-16:47:12] [I] Exclude Lean Runtime: Disabled
[02/07/2025-16:47:12] [I] Sparsity: Disabled
[02/07/2025-16:47:12] [I] Safe mode: Disabled
[02/07/2025-16:47:12] [I] Build DLA standalone loadable: Disabled
[02/07/2025-16:47:12] [I] Allow GPU fallback for DLA: Disabled
[02/07/2025-16:47:12] [I] DirectIO mode: Disabled
[02/07/2025-16:47:12] [I] Restricted mode: Disabled
[02/07/2025-16:47:12] [I] Skip inference: Disabled
[02/07/2025-16:47:12] [I] Save engine: pointpillars.trt
[02/07/2025-16:47:12] [I] Load engine: 
[02/07/2025-16:47:12] [I] Profiling verbosity: 0
[02/07/2025-16:47:12] [I] Tactic sources: Using default tactic sources
[02/07/2025-16:47:12] [I] timingCacheMode: local
[02/07/2025-16:47:12] [I] timingCacheFile: 
[02/07/2025-16:47:12] [I] Enable Compilation Cache: Enabled
[02/07/2025-16:47:12] [I] errorOnTimingCacheMiss: Disabled
[02/07/2025-16:47:12] [I] Preview Features: Use default preview flags.
[02/07/2025-16:47:12] [I] MaxAuxStreams: -1
[02/07/2025-16:47:12] [I] BuilderOptimizationLevel: -1
[02/07/2025-16:47:12] [I] Calibration Profile Index: 0
[02/07/2025-16:47:12] [I] Weight Streaming: Disabled
[02/07/2025-16:47:12] [I] Runtime Platform: Same As Build
[02/07/2025-16:47:12] [I] Debug Tensors: 
[02/07/2025-16:47:12] [I] Input(s)s format: fp32:CHW
[02/07/2025-16:47:12] [I] Output(s)s format: fp32:CHW
[02/07/2025-16:47:12] [I] Input build shape (profile 0): points=1x204800x4+1x204800x4+1x204800x4
[02/07/2025-16:47:12] [I] Input build shape (profile 0): num_points=1+1+1
[02/07/2025-16:47:12] [I] Input calibration shapes: model
[02/07/2025-16:47:12] [I] === System Options ===
[02/07/2025-16:47:12] [I] Device: 0
[02/07/2025-16:47:12] [I] DLACore: 
[02/07/2025-16:47:12] [I] Plugins:
[02/07/2025-16:47:12] [I] setPluginsToSerialize:
[02/07/2025-16:47:12] [I] dynamicPlugins:
[02/07/2025-16:47:12] [I] ignoreParsedPluginLibs: 0
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] === Inference Options ===
[02/07/2025-16:47:12] [I] Batch: Explicit
[02/07/2025-16:47:12] [I] Input inference shape : num_points=1
[02/07/2025-16:47:12] [I] Input inference shape : points=1x204800x4
[02/07/2025-16:47:12] [I] Iterations: 10
[02/07/2025-16:47:12] [I] Duration: 3s (+ 200ms warm up)
[02/07/2025-16:47:12] [I] Sleep time: 0ms
[02/07/2025-16:47:12] [I] Idle time: 0ms
[02/07/2025-16:47:12] [I] Inference Streams: 1
[02/07/2025-16:47:12] [I] ExposeDMA: Disabled
[02/07/2025-16:47:12] [I] Data transfers: Enabled
[02/07/2025-16:47:12] [I] Spin-wait: Disabled
[02/07/2025-16:47:12] [I] Multithreading: Disabled
[02/07/2025-16:47:12] [I] CUDA Graph: Disabled
[02/07/2025-16:47:12] [I] Separate profiling: Disabled
[02/07/2025-16:47:12] [I] Time Deserialize: Disabled
[02/07/2025-16:47:12] [I] Time Refit: Disabled
[02/07/2025-16:47:12] [I] NVTX verbosity: 0
[02/07/2025-16:47:12] [I] Persistent Cache Ratio: 0
[02/07/2025-16:47:12] [I] Optimization Profile Index: 0
[02/07/2025-16:47:12] [I] Weight Streaming Budget: 100.000000%
[02/07/2025-16:47:12] [I] Inputs:
[02/07/2025-16:47:12] [I] Debug Tensor Save Destinations:
[02/07/2025-16:47:12] [I] === Reporting Options ===
[02/07/2025-16:47:12] [I] Verbose: Disabled
[02/07/2025-16:47:12] [I] Averages: 10 inferences
[02/07/2025-16:47:12] [I] Percentiles: 90,95,99
[02/07/2025-16:47:12] [I] Dump refittable layers:Disabled
[02/07/2025-16:47:12] [I] Dump output: Disabled
[02/07/2025-16:47:12] [I] Profile: Disabled
[02/07/2025-16:47:12] [I] Export timing to JSON file: 
[02/07/2025-16:47:12] [I] Export output to JSON file: 
[02/07/2025-16:47:12] [I] Export profile to JSON file: 
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] === Device Information ===
[02/07/2025-16:47:12] [I] Available Devices: 
[02/07/2025-16:47:12] [I]   Device 0: "NVIDIA GeForce RTX 3050" UUID: GPU-56d342c8-117a-faea-1189-84c071cfdf62
[02/07/2025-16:47:12] [I] Selected Device: NVIDIA GeForce RTX 3050
[02/07/2025-16:47:12] [I] Selected Device ID: 0
[02/07/2025-16:47:12] [I] Selected Device UUID: GPU-56d342c8-117a-faea-1189-84c071cfdf62
[02/07/2025-16:47:12] [I] Compute Capability: 8.6
[02/07/2025-16:47:12] [I] SMs: 20
[02/07/2025-16:47:12] [I] Device Global Memory: 7958 MiB
[02/07/2025-16:47:12] [I] Shared Memory per SM: 100 KiB
[02/07/2025-16:47:12] [I] Memory Bus Width: 128 bits (ECC disabled)
[02/07/2025-16:47:12] [I] Application Compute Clock Rate: 1.807 GHz
[02/07/2025-16:47:12] [I] Application Memory Clock Rate: 7.001 GHz
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/07/2025-16:47:12] [I] 
[02/07/2025-16:47:12] [I] TensorRT version: 10.3.0
[02/07/2025-16:47:12] [I] Loading standard plugins
[02/07/2025-16:47:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 20, GPU 615 (MiB)
[02/07/2025-16:47:16] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2087, GPU +386, now: CPU 2262, GPU 1001 (MiB)
[02/07/2025-16:47:16] [I] Start parsing network model.
[02/07/2025-16:47:16] [I] [TRT] ----------------------------------------------------------------
[02/07/2025-16:47:16] [I] [TRT] Input filename:   pointpillars.onnx
[02/07/2025-16:47:16] [I] [TRT] ONNX IR version:  0.0.8
[02/07/2025-16:47:16] [I] [TRT] Opset version:    11
[02/07/2025-16:47:16] [I] [TRT] Producer name:    
[02/07/2025-16:47:16] [I] [TRT] Producer version: 
[02/07/2025-16:47:16] [I] [TRT] Domain:           
[02/07/2025-16:47:16] [I] [TRT] Model version:    0
[02/07/2025-16:47:16] [I] [TRT] Doc string:       
[02/07/2025-16:47:16] [I] [TRT] ----------------------------------------------------------------
[02/07/2025-16:47:16] [I] [TRT] No checker registered for op: VoxelGeneratorPlugin. Attempting to check as plugin.
[02/07/2025-16:47:16] [I] [TRT] No importer registered for op: VoxelGeneratorPlugin. Attempting to import as plugin.
[02/07/2025-16:47:16] [I] [TRT] Searching for plugin: VoxelGeneratorPlugin, plugin_version: 1, plugin_namespace: 
[02/07/2025-16:47:16] [I] [TRT] Successfully created plugin: VoxelGeneratorPlugin
[02/07/2025-16:47:16] [I] [TRT] No checker registered for op: PillarScatterPlugin. Attempting to check as plugin.
[02/07/2025-16:47:16] [I] [TRT] No importer registered for op: PillarScatterPlugin. Attempting to import as plugin.
[02/07/2025-16:47:16] [I] [TRT] Searching for plugin: PillarScatterPlugin, plugin_version: 1, plugin_namespace: 
[02/07/2025-16:47:16] [I] [TRT] Successfully created plugin: PillarScatterPlugin
[02/07/2025-16:47:16] [I] [TRT] No checker registered for op: DecodeBbox3DPlugin. Attempting to check as plugin.
[02/07/2025-16:47:16] [I] [TRT] No importer registered for op: DecodeBbox3DPlugin. Attempting to import as plugin.
[02/07/2025-16:47:16] [I] [TRT] Searching for plugin: DecodeBbox3DPlugin, plugin_version: 1, plugin_namespace: 
[02/07/2025-16:47:16] [I] [TRT] Successfully created plugin: DecodeBbox3DPlugin
[02/07/2025-16:47:16] [I] Finished parsing network model. Parse time: 0.0232674
[02/07/2025-16:47:16] [I] Set shape of input tensor points for optimization profile 0 to: MIN=1x204800x4 OPT=1x204800x4 MAX=1x204800x4
[02/07/2025-16:47:16] [I] Set shape of input tensor num_points for optimization profile 0 to: MIN=1 OPT=1 MAX=1
[02/07/2025-16:47:16] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[02/07/2025-16:47:44] [I] [TRT] Detected 2 inputs and 2 output network tensors.
[02/07/2025-16:47:45] [I] [TRT] Total Host Persistent Memory: 115984
[02/07/2025-16:47:45] [I] [TRT] Total Device Persistent Memory: 529408
[02/07/2025-16:47:45] [I] [TRT] Total Scratch Memory: 270532608
[02/07/2025-16:47:45] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 39 steps to complete.
[02/07/2025-16:47:45] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.472854ms to assign 6 blocks to 39 nodes requiring 319466496 bytes.
[02/07/2025-16:47:45] [I] [TRT] Total Activation Memory: 319465984
[02/07/2025-16:47:45] [I] [TRT] Total Weights Memory: 6627616
[02/07/2025-16:47:45] [I] [TRT] Engine generation completed in 28.9121 seconds.
[02/07/2025-16:47:45] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 247 MiB
[02/07/2025-16:47:45] [I] [TRT] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3506 MiB
[02/07/2025-16:47:45] [I] Engine built in 28.9264 sec.
[02/07/2025-16:47:45] [I] Created engine with size: 7.52948 MiB
[02/07/2025-16:47:45] [I] [TRT] Loaded engine size: 7 MiB
[02/07/2025-16:47:45] [I] Engine deserialized in 0.00884943 sec.
[02/07/2025-16:47:45] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +305, now: CPU 0, GPU 311 (MiB)
[02/07/2025-16:47:45] [I] Setting persistentCacheLimit to 0 bytes.
[02/07/2025-16:47:45] [I] Created execution context with device memory size: 304.667 MiB
[02/07/2025-16:47:45] [I] Using random values for input points
[02/07/2025-16:47:45] [I] Input binding for points with dimensions 1x204800x4 is created.
[02/07/2025-16:47:45] [I] Using random values for input num_points
[02/07/2025-16:47:45] [I] Input binding for num_points with dimensions 1 is created.
[02/07/2025-16:47:45] [I] Output binding for output_boxes with dimensions 1x393216x9 is created.
[02/07/2025-16:47:45] [I] Output binding for num_boxes with dimensions 1 is created.
[02/07/2025-16:47:45] [I] Starting inference
[02/07/2025-16:47:48] [I] Warmup completed 38 queries over 200 ms
[02/07/2025-16:47:48] [I] Timing trace has 562 queries over 3.01639 s
[02/07/2025-16:47:48] [I] 
[02/07/2025-16:47:48] [I] === Trace details ===
[02/07/2025-16:47:48] [I] Trace averages of 10 runs:
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32603 ms - Host latency: 6.67 ms (enqueue 0.121249 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32287 ms - Host latency: 6.66584 ms (enqueue 0.123315 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33813 ms - Host latency: 6.6947 ms (enqueue 0.37449 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34866 ms - Host latency: 6.71088 ms (enqueue 0.503482 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32604 ms - Host latency: 6.69517 ms (enqueue 0.603879 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34661 ms - Host latency: 6.71105 ms (enqueue 0.550467 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33544 ms - Host latency: 6.70376 ms (enqueue 0.604047 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33165 ms - Host latency: 6.70045 ms (enqueue 0.603473 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34303 ms - Host latency: 6.71248 ms (enqueue 0.603162 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33187 ms - Host latency: 6.70071 ms (enqueue 0.60415 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33473 ms - Host latency: 6.70226 ms (enqueue 0.563995 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34416 ms - Host latency: 6.70048 ms (enqueue 0.466834 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.60559 ms - Host latency: 6.97324 ms (enqueue 0.521735 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.51282 ms - Host latency: 6.96803 ms (enqueue 0.500397 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33761 ms - Host latency: 6.70518 ms (enqueue 0.61106 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34877 ms - Host latency: 6.71788 ms (enqueue 0.605212 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32324 ms - Host latency: 6.69222 ms (enqueue 0.624792 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33719 ms - Host latency: 6.70643 ms (enqueue 0.603772 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34885 ms - Host latency: 6.71973 ms (enqueue 0.619775 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33636 ms - Host latency: 6.70391 ms (enqueue 0.605933 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33063 ms - Host latency: 6.69882 ms (enqueue 0.605725 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.50693 ms - Host latency: 6.87932 ms (enqueue 0.619861 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.48197 ms - Host latency: 6.85001 ms (enqueue 0.558411 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3382 ms - Host latency: 6.69918 ms (enqueue 0.452905 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.35081 ms - Host latency: 6.71865 ms (enqueue 0.612085 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32389 ms - Host latency: 6.68502 ms (enqueue 0.575732 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34191 ms - Host latency: 6.71628 ms (enqueue 0.639648 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.35502 ms - Host latency: 6.72699 ms (enqueue 0.560095 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34086 ms - Host latency: 6.71104 ms (enqueue 0.636792 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.51641 ms - Host latency: 6.87135 ms (enqueue 0.531763 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3255 ms - Host latency: 6.67443 ms (enqueue 0.48988 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34557 ms - Host latency: 6.70552 ms (enqueue 0.547681 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32837 ms - Host latency: 6.7009 ms (enqueue 0.6276 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.35033 ms - Host latency: 6.71418 ms (enqueue 0.522705 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32811 ms - Host latency: 6.70371 ms (enqueue 0.668958 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34656 ms - Host latency: 6.7238 ms (enqueue 0.652075 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33059 ms - Host latency: 6.70889 ms (enqueue 0.655664 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32419 ms - Host latency: 6.70139 ms (enqueue 0.655005 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.44292 ms - Host latency: 6.82136 ms (enqueue 0.658057 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34622 ms - Host latency: 6.72185 ms (enqueue 0.677686 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32615 ms - Host latency: 6.70027 ms (enqueue 0.676904 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34031 ms - Host latency: 6.71704 ms (enqueue 0.662476 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33862 ms - Host latency: 6.70647 ms (enqueue 0.527612 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34087 ms - Host latency: 6.70713 ms (enqueue 0.525 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3314 ms - Host latency: 6.70781 ms (enqueue 0.659277 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32749 ms - Host latency: 6.70483 ms (enqueue 0.658228 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33015 ms - Host latency: 6.70642 ms (enqueue 0.652002 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32817 ms - Host latency: 6.70493 ms (enqueue 0.660205 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34448 ms - Host latency: 6.72043 ms (enqueue 0.659692 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.39229 ms - Host latency: 6.74646 ms (enqueue 0.525439 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.32061 ms - Host latency: 6.6845 ms (enqueue 0.553271 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34688 ms - Host latency: 6.72087 ms (enqueue 0.660205 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33582 ms - Host latency: 6.71011 ms (enqueue 0.632886 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.34663 ms - Host latency: 6.71179 ms (enqueue 0.473486 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.3408 ms - Host latency: 6.71799 ms (enqueue 0.674658 ms)
[02/07/2025-16:47:48] [I] Average on 10 runs - GPU latency: 5.33108 ms - Host latency: 6.70874 ms (enqueue 0.667993 ms)
[02/07/2025-16:47:48] [I] 
[02/07/2025-16:47:48] [I] === Performance summary ===
[02/07/2025-16:47:48] [I] Throughput: 186.316 qps
[02/07/2025-16:47:48] [I] Latency: min = 6.60522 ms, max = 8.19635 ms, mean = 6.72611 ms, median = 6.70074 ms, percentile(90%) = 6.78662 ms, percentile(95%) = 6.81827 ms, percentile(99%) = 7.7793 ms
[02/07/2025-16:47:48] [I] Enqueue Time: min = 0.119629 ms, max = 0.78894 ms, mean = 0.575947 ms, median = 0.6073 ms, percentile(90%) = 0.673828 ms, percentile(95%) = 0.686035 ms, percentile(99%) = 0.720703 ms
[02/07/2025-16:47:48] [I] H2D Latency: min = 0.256927 ms, max = 1.21075 ms, mean = 0.285035 ms, median = 0.285126 ms, percentile(90%) = 0.292969 ms, percentile(95%) = 0.293701 ms, percentile(99%) = 0.295654 ms
[02/07/2025-16:47:48] [I] GPU Compute Time: min = 5.25818 ms, max = 6.83417 ms, mean = 5.35634 ms, median = 5.33044 ms, percentile(90%) = 5.41797 ms, percentile(95%) = 5.44971 ms, percentile(99%) = 6.36108 ms
[02/07/2025-16:47:48] [I] D2H Latency: min = 1.07739 ms, max = 1.08862 ms, mean = 1.08474 ms, median = 1.08521 ms, percentile(90%) = 1.08667 ms, percentile(95%) = 1.08704 ms, percentile(99%) = 1.08765 ms
[02/07/2025-16:47:48] [I] Total Host Walltime: 3.01639 s
[02/07/2025-16:47:48] [I] Total GPU Compute Time: 3.01026 s
[02/07/2025-16:47:48] [W] * GPU compute time is unstable, with coefficient of variance = 2.56954%.
[02/07/2025-16:47:48] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[02/07/2025-16:47:48] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/07/2025-16:47:48] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100300] # /usr/src/tensorrt/bin/trtexec --onnx=pointpillars.onnx --saveEngine=pointpillars.trt --fp16 --minShapes=points:1x204800x4,num_points:1 --optShapes=points:1x204800x4,num_points:1 --maxShapes=points:1x204800x4,num_points:1

Any insights or suggestions on how to resolve the DW_DNN_INVALID_MODEL error would be greatly appreciated.

Morganh · February 8, 2025, 2:11am

Please try to generate trt engine directly in Driveworks.

atharv.sharma · February 10, 2025, 12:47pm

Hello,
I’m attempting to convert a model in my Docker DriveOS container using the TensorRT optimization tool, but I’m encountering an error. Here are the details:

I’m using the dwDNN libraries, which require models to be optimized using the TensorRT optimization tool. The model I’m working with is in ONNX format.

command:

./tensorRT_optimization \    --modelType=onnx \    --onnxFile=/usr/local/driveworks/samples/src/sensors/lidar/lidar_custom/model.onnx \    --out=pointpillars_optimized.bin \    --half2=1 \    --iterations=100 \    --workspaceSize=4096 \    --verbose=1

error:

[10-02-2025 11:36:30] Marking output_boxes_14 as output: output_boxes
[10-02-2025 11:36:30] Marking num_boxes_15 as output: num_boxes
[10-02-2025 11:36:30] DNNGenerator: Input "points": -1x204800x4
[10-02-2025 11:36:30] DNNGenerator: Input "num_points": -1
[10-02-2025 11:36:30] DNNGenerator: Output "output_boxes": -1x393216x9
[10-02-2025 11:36:30] DNNGenerator: Output "num_boxes": -1
[10-02-2025 11:36:30] Error[4]: [network.cpp::validate::3162] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
[10-02-2025 11:36:30] [10-02-2025 11:36:30] Releasing Driveworks SDK Context
Error: DW_INTERNAL_ERROR: DNNGenerator: Network build and serialization failed.

dwDNN api info:

/**
 * Creates and initializes a TensorRT Network from file.
 *
 * @param[out] network A pointer to network handle that will be initialized from parameters.
 * @param[in] modelFilename A pointer to the name of the TensorRT model file.
 * @param[in] pluginConfiguration An optional pointer to plugin configuration for custom layers.
 * @param[in] processorType Processor that the inference should run on. Note that the model must be
 * generated for this processor type.
 * @param[in] context Specifies the handle to the context under which the DNN module is created.
 *
 * @return DW_INVALID_ARGUMENT - if pointer to the network handle or the model filename are NULL. <br>
 *         DW_DNN_INVALID_MODEL - if the provided model is invalid. <br>
 *         DW_CUDA_ERROR - if compute capability does not met the network type's requirements. <br>
 *         DW_FILE_NOT_FOUND - if given model file does not exist. <br>
 *         DW_SUCCESS otherwise.<br>
 *
 * @note The network file must be created by the TensorRT_optimization tool.
 *
 * @note DNN module will look for metadata file named \<modelFilename\>.json in the same folder.
 * If it is present, metadata will be loaded from that file. Otherwise, it will be filled with default
 * values. Example metadata:
 *
 *     {
 *         "dataConditionerParams" : {
 *             "meanValue" : [0.0, 0.0, 0.0],
 *             "splitPlanes" : true,
 *             "pixelScaleCoefficient": 1.0,
 *             "ignoreAspectRatio" : false,
 *             "doPerPlaneMeanNormalization" : false
 *         }
 *         "tonemapType" : "none",
 *         "__comment": "tonemapType can be one of {none, agtm}"
 *     }
 * \
 *
 */

Any guidance or suggestions would be greatly appreciated!

atharv.sharma · February 10, 2025, 2:17pm

Hi, I used a script which allows to convert dynamic to static input/output types. After that I used that in tensorrt optimization tool. However, when I try to load the generated optimised .bin I’m getting an DW_DNN_INVALID_MODEL :

script for conversion from dynamic input to static output:(ran in local docker)

import onnx
import numpy as np
from onnx import helper, shape_inference

def convert_dynamic_to_static(input_model_path, output_model_path, batch_size=1):
    """
    Convert ONNX model with dynamic inputs to static inputs while preserving custom operators.
    
    Args:
        input_model_path (str): Path to input ONNX model
        output_model_path (str): Path to save converted model
        batch_size (int): Desired batch size for static model
    """
    # Load the model
    model = onnx.load(input_model_path)
    
    # Define static shapes
    static_shapes = {
        "points": [batch_size, 204800, 4],
        "num_points": [batch_size],
        "output_boxes": [batch_size, 393216, 9],
        "num_boxes": [batch_size]
    }
    
    # Create new graph inputs with static shapes
    new_inputs = []
    for input in model.graph.input:
        if input.name in static_shapes:
            # Create new input with static shape
            new_input = helper.make_tensor_value_info(
                name=input.name,
                elem_type=input.type.tensor_type.elem_type,
                shape=static_shapes[input.name]
            )
            new_inputs.append(new_input)
        else:
            new_inputs.append(input)
    
    # Create new graph outputs with static shapes
    new_outputs = []
    for output in model.graph.output:
        if output.name in static_shapes:
            # Create new output with static shape
            new_output = helper.make_tensor_value_info(
                name=output.name,
                elem_type=output.type.tensor_type.elem_type,
                shape=static_shapes[output.name]
            )
            new_outputs.append(new_output)
        else:
            new_outputs.append(output)
    
    # Create new graph with static shapes while preserving all other properties
    new_graph = helper.make_graph(
        nodes=model.graph.node,
        name=model.graph.name,
        inputs=new_inputs,
        outputs=new_outputs,
        initializer=model.graph.initializer,
    )
    
    # Create new model with static shapes
    new_model = helper.make_model(
        new_graph,
        producer_name=model.producer_name,
        producer_version=model.producer_version,
        domain=model.domain,
        model_version=model.model_version,
        doc_string=model.doc_string,
    )
    
    # Copy over all opset imports, including custom ones
    new_model.opset_import.extend(model.opset_import)
    
    # Try to run shape inference
    try:
        new_model = shape_inference.infer_shapes(new_model)
    except Exception as e:
        print(f"Warning: Shape inference failed (this is normal for models with custom ops): {e}")
    
    # Save the converted model
    onnx.save(new_model, output_model_path)
    print(f"Model converted successfully and saved to {output_model_path}")

def verify_model_shapes(model_path):
    """
    Verify the shapes of inputs and outputs in the ONNX model.
    
    Args:
        model_path (str): Path to ONNX model
    """
    model = onnx.load(model_path)
    
    def get_type_name(tensor_type):
        """Helper function to get type name"""
        np_dtype = onnx.helper.tensor_dtype_to_np_dtype(tensor_type)
        return str(np_dtype).upper()
    
    print("\nModel Inputs:")
    for input in model.graph.input:
        print(f"Name: {input.name}")
        shape = [dim.dim_value if dim.dim_value != 0 else 'dynamic' 
                for dim in input.type.tensor_type.shape.dim]
        print(f"Shape: {shape}")
        print(f"Data Type: {get_type_name(input.type.tensor_type.elem_type)}")
        print()
    
    print("Model Outputs:")
    for output in model.graph.output:
        print(f"Name: {output.name}")
        shape = [dim.dim_value if dim.dim_value != 0 else 'dynamic' 
                for dim in output.type.tensor_type.shape.dim]
        print(f"Shape: {shape}")
        print(f"Data Type: {get_type_name(output.type.tensor_type.elem_type)}")
        print()
    
    print("\nCustom Operators:")
    custom_ops = set()
    for node in model.graph.node:
        if not node.domain == "":  # Custom ops typically have a non-empty domain
            custom_ops.add(f"{node.domain}::{node.op_type}")
    if custom_ops:
        print("Found custom operators:")
        for op in custom_ops:
            print(f"- {op}")
    else:
        print("No custom operators found")

if __name__ == "__main__":
    # Example usage
    input_model = "model.onnx"
    output_model = "static_model.onnx"
    
    # Convert the model
    convert_dynamic_to_static(input_model, output_model, batch_size=1)
    
    # Verify the conversion
    print("\nVerifying converted model:")
    verify_model_shapes(output_model)

drivework code:

        // Initialize DNN from TensorRT file
        dwStatus status = dwDNN_initializeTensorRTFromFile(&m_dnnHandle, enginePath.c_str(), nullptr,
                                                          DW_PROCESSOR_TYPE_GPU, m_context);
        if (status != DW_SUCCESS) {
            DW_LOG_ERROR("Failed to load TensorRT model: " << dwGetStatusName(status));
            throw std::runtime_error("Failed to load TensorRT model");
        }

logs:

[ERROR][/usr/local/driveworks/samples/src/sensors/lidar/lidar_custom/TensorRTEngine.cpp:72] Failed to load TensorRT model: DW_DNN_INVALID_MODEL
[ERROR][/usr/local/driveworks/samples/src/sensors/lidar/lidar_custom/TensorRTEngine.cpp:111] Exception during initialization: Failed to load TensorRT model

Morganh · February 11, 2025, 9:09am

Is this onnx file trained from TAO?

atharv.sharma · February 11, 2025, 3:18pm

Hey @Morganh, after downloading it, I tried to use it in dwDNN, but I had to modify the input size to fit my requirements. However, I’m encountering some issues during deployment.

Here’s what I’ve done so far:

Downloaded the model (deployable_v1.1) from the NGC post.
Adjusted the input size to match my dataset.
Attempted to load and deploy it in dwDNN.

The model doesn’t seem to work as expected after changing the input size

Morganh · February 12, 2025, 2:46am

In TAO doc, TRTEXEC with PointPillars - NVIDIA Docs, it provides an example how to convert to tensorrt engine by using trtexec.

For Driveworks environment, seems that trtexec is not directly mentioned as a tool within the DriveWorks SDK. From your command, DriveWorks does include DNN optimization tools that utilize TensorRT.

Can you check if this “TensorRT Optimizer Tool” can handle dynamic input shape. If yes, you can run something similar like below.

Example command with optimization profiles (if supported):
bash
./tensorRT_optimization \
    --modelType=onnx \
    --onnxFile=/usr/local/driveworks/samples/src/sensors/lidar/lidar_custom/model.onnx \
    --out=pointpillars_optimized.bin \
    --half2=1 \
    --iterations=100 \
    --workspaceSize=4096 \
    --verbose=1 \
    --minShapes=points:1x204800x4,num_points:1 \
    --optShapes=points:1x204800x4,num_points:1 \
    --maxShapes=points:1x204800x4,num_points:1

More, it is out of TAO scope to handle onnx file in Driveworks environment. Please also check in driveworks forum to get better help.
DRIVE AGX Orin General - NVIDIA Developer Forums.

Topic		Replies	Views
Error loading .trt model Jetson AGX Orin tensorrt	7	135	November 6, 2024
TensorRT inference process TensorRT	4	636	May 17, 2021
deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match TensorRT	4	2868	April 22, 2024
Tensor RT optimization causes performance downgrade compared to onnx model TensorRT	4	862	January 26, 2022
AssertionError: Max workspace size for TensorRT inference should be positive, got 0 TensorRT	4	730	July 21, 2021
Tensorrt inference with batch > 1 TensorRT	4	1387	October 13, 2022
Tensorrt Inference Segmentation fault TensorRT tensorrt , cudnn	6	325	June 5, 2024
ConvTranspose + Add Slow TensorRT tensorrt	4	653	July 25, 2023
Trt with batch TensorRT	4	629	July 27, 2022
tensorRT inference unstable compared onnxruntime TensorRT	4	1316	May 4, 2021

DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet | NVIDIA NGC)

Related topics