Trtexec convert onnx to engine fails

erence · October 19, 2023, 5:22am

Please provide the following information when requesting support.

• Hardware (RTX2700)
• Network Type (Detectnet_v2)
• TLT Version (nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5)
• Training spec file(
detectnet_train_cfg3.txt (3.5 KB)
)
• How to reproduce the issue ?

trtexec --onnx=/tao/eyestrab_detectnet_resnet18.onnx --saveEngine=/tao/resnet_engine_fp16.trt --fp16 --workspace=8000 --shapes=data:1x3x1920x1200

and a few variations of this command fail as:

-Engine set up failed
or
-Dynamic dimensions required for input: input_1:0, but no shapes were provided. Automatically overriding shape to: 1x3x1920x1200

Help appreciated,
Best regards.

Morganh · October 19, 2023, 6:04am

Refer to TRTEXEC with DetectNet-v2 - NVIDIA Docs.
For fp16, you can run

trtexec --onnx=/path/to/model.onnx \
        --maxShapes="input_1:0":16x3x544x960 \
        --minShapes="input_1:0":1x3x544x960 \
        --optShapes="input_1:0":8x3x544x960 \
        --fp16 \
        --saveEngine=/path/to/save/trt/model.engine

The 544x960 can be modified to the actual heightxwidith of your model.
Also, the batch-size can also be changed. For example, 8x3x544x960 changes to 1x3x544x960

erence · October 20, 2023, 4:14pm

I end up in segmentation error. How can I debug?

Morganh · October 20, 2023, 4:22pm

Please share the full log. Thanks.

erence · October 21, 2023, 6:48am

I am using this command on nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 container

&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=/tao/eyestrab_detectnet_resnet18.onnx --saveEngine=/tao/resnet_engine_fp16.trt --fp16 --workspace=8 --shapes=data:1x1920x1200 --explicitBatch

Log:

[10/20/2023-16:16:19] [W] --explicitBatch flag has been deprecated and has no effect!
[10/20/2023-16:16:19] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[10/20/2023-16:16:19] [W] --workspace flag has been deprecated by --memPoolSize flag.
[10/20/2023-16:16:19] [I] === Model Options ===
[10/20/2023-16:16:19] [I] Format: ONNX
[10/20/2023-16:16:19] [I] Model: /tao/eyestrab_detectnet_resnet18.onnx
[10/20/2023-16:16:19] [I] Output:
[10/20/2023-16:16:19] [I] === Build Options ===
[10/20/2023-16:16:19] [I] Max batch: explicit batch
[10/20/2023-16:16:19] [I] Memory Pools: workspace: 8 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/20/2023-16:16:19] [I] minTiming: 1
[10/20/2023-16:16:19] [I] avgTiming: 8
[10/20/2023-16:16:19] [I] Precision: FP32+FP16
[10/20/2023-16:16:19] [I] LayerPrecisions:
[10/20/2023-16:16:19] [I] Layer Device Types:
[10/20/2023-16:16:19] [I] Calibration:
[10/20/2023-16:16:19] [I] Refit: Disabled
[10/20/2023-16:16:19] [I] Version Compatible: Disabled
[10/20/2023-16:16:19] [I] TensorRT runtime: full
[10/20/2023-16:16:19] [I] Lean DLL Path:
[10/20/2023-16:16:19] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/20/2023-16:16:19] [I] Exclude Lean Runtime: Disabled
[10/20/2023-16:16:19] [I] Sparsity: Disabled
[10/20/2023-16:16:19] [I] Safe mode: Disabled
[10/20/2023-16:16:19] [I] Build DLA standalone loadable: Disabled
[10/20/2023-16:16:19] [I] Allow GPU fallback for DLA: Disabled
[10/20/2023-16:16:19] [I] DirectIO mode: Disabled
[10/20/2023-16:16:19] [I] Restricted mode: Disabled
[10/20/2023-16:16:19] [I] Skip inference: Disabled
[10/20/2023-16:16:19] [I] Save engine: /tao/resnet_engine_fp16.trt
[10/20/2023-16:16:19] [I] Load engine:
[10/20/2023-16:16:19] [I] Profiling verbosity: 0
[10/20/2023-16:16:19] [I] Tactic sources: Using default tactic sources
[10/20/2023-16:16:19] [I] timingCacheMode: local
[10/20/2023-16:16:19] [I] timingCacheFile:
[10/20/2023-16:16:19] [I] Heuristic: Disabled
[10/20/2023-16:16:19] [I] Preview Features: Use default preview flags.
[10/20/2023-16:16:19] [I] MaxAuxStreams: -1
[10/20/2023-16:16:19] [I] BuilderOptimizationLevel: -1
[10/20/2023-16:16:19] [I] Input(s)s format: fp32:CHW
[10/20/2023-16:16:19] [I] Output(s)s format: fp32:CHW
[10/20/2023-16:16:19] [I] Input build shapes: model
[10/20/2023-16:16:19] [I] Input calibration shapes: model
[10/20/2023-16:16:19] [I] === System Options ===
[10/20/2023-16:16:19] [I] Device: 0
[10/20/2023-16:16:19] [I] DLACore:
[10/20/2023-16:16:19] [I] Plugins:
[10/20/2023-16:16:19] [I] setPluginsToSerialize:
[10/20/2023-16:16:19] [I] dynamicPlugins:
[10/20/2023-16:16:19] [I] ignoreParsedPluginLibs: 0
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] === Inference Options ===
[10/20/2023-16:16:19] [I] Batch: Explicit
[10/20/2023-16:16:19] [I] Input inference shapes: model
[10/20/2023-16:16:19] [I] Iterations: 10
[10/20/2023-16:16:19] [I] Duration: 3s (+ 200ms warm up)
[10/20/2023-16:16:19] [I] Sleep time: 0ms
[10/20/2023-16:16:19] [I] Idle time: 0ms
[10/20/2023-16:16:19] [I] Inference Streams: 1
[10/20/2023-16:16:19] [I] ExposeDMA: Disabled
[10/20/2023-16:16:19] [I] Data transfers: Enabled
[10/20/2023-16:16:19] [I] Spin-wait: Disabled
[10/20/2023-16:16:19] [I] Multithreading: Disabled
[10/20/2023-16:16:19] [I] CUDA Graph: Disabled
[10/20/2023-16:16:19] [I] Separate profiling: Disabled
[10/20/2023-16:16:19] [I] Time Deserialize: Disabled
[10/20/2023-16:16:19] [I] Time Refit: Disabled
[10/20/2023-16:16:19] [I] NVTX verbosity: 0
[10/20/2023-16:16:19] [I] Persistent Cache Ratio: 0
[10/20/2023-16:16:19] [I] Inputs:
[10/20/2023-16:16:19] [I] === Reporting Options ===
[10/20/2023-16:16:19] [I] Verbose: Disabled
[10/20/2023-16:16:19] [I] Averages: 10 inferences
[10/20/2023-16:16:19] [I] Percentiles: 90,95,99
[10/20/2023-16:16:19] [I] Dump refittable layers:Disabled
[10/20/2023-16:16:19] [I] Dump output: Disabled
[10/20/2023-16:16:19] [I] Profile: Disabled
[10/20/2023-16:16:19] [I] Export timing to JSON file:
[10/20/2023-16:16:19] [I] Export output to JSON file:
[10/20/2023-16:16:19] [I] Export profile to JSON file:
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] === Device Information ===
[10/20/2023-16:16:19] [I] Selected Device: NVIDIA GeForce RTX 2070
[10/20/2023-16:16:19] [I] Compute Capability: 7.5
[10/20/2023-16:16:19] [I] SMs: 36
[10/20/2023-16:16:19] [I] Device Global Memory: 7972 MiB
[10/20/2023-16:16:19] [I] Shared Memory per SM: 64 KiB
[10/20/2023-16:16:19] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/20/2023-16:16:19] [I] Application Compute Clock Rate: 1.44 GHz
[10/20/2023-16:16:19] [I] Application Memory Clock Rate: 7.001 GHz
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] TensorRT version: 8.6.1
[10/20/2023-16:16:19] [I] Loading standard plugins
[10/20/2023-16:16:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 538 (MiB)
[10/20/2023-16:16:21] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 423, GPU 614 (MiB)
[10/20/2023-16:16:21] [I] Start parsing network model.
[10/20/2023-16:16:21] [I] [TRT] ----------------------------------------------------------------
[10/20/2023-16:16:21] [I] [TRT] Input filename: /tao/eyestrab_detectnet_resnet18.onnx
[10/20/2023-16:16:21] [I] [TRT] ONNX IR version: 0.0.7
[10/20/2023-16:16:21] [I] [TRT] Opset version: 12
[10/20/2023-16:16:21] [I] [TRT] Producer name: tf2onnx
[10/20/2023-16:16:21] [I] [TRT] Producer version: 1.9.2
[10/20/2023-16:16:21] [I] [TRT] Domain:
[10/20/2023-16:16:21] [I] [TRT] Model version: 0
[10/20/2023-16:16:21] [I] [TRT] Doc string:
[10/20/2023-16:16:21] [I] [TRT] ----------------------------------------------------------------
[10/20/2023-16:16:21] [I] Finished parsing network model. Parse time: 0.111052
[10/20/2023-16:16:21] [W] Dynamic dimensions required for input: input_1:0, but no shapes were provided. Automatically overriding shape to: 1x3x1920x1200
Segmentation fault (core dumped)

I have used first yolo annotation format, converted it to kitti (nvidia 15) then created tfrecords. Might it be because of the beginning tool? What would your favorite annotation tool for tao and kitti generally?

Morganh · October 21, 2023, 4:14pm

Could you retry as below?

trtexec --onnx=/path/to/model.onnx \
        --maxShapes="input_1:0":1x3x1200x1900 \
        --minShapes="input_1:0":1x3x1200x1900 \
        --optShapes="input_1:0":1x3x1200x1900 \
        --fp16 \
        --saveEngine=/path/to/save/trt/model.engine

I assume the input of your model is 3x1900x1200 (channel * width * height)

erence · October 22, 2023, 11:11am

I have created a working yolo_v4_tiny model. It can infere with tao infere command. But the problem with trtexec remains the same. The new model has the following retrain spec.
yolo_v4_tiny_retrain_kitti_seq.txt (1.9 KB)
It has a width of 256 and height 160. I tried a smaller model.

I run on nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 container.

trtexec --onnx=/tao/limbus_data/model.onnx --saveEngine=/tao/limbus_data/model_engine.trt --minShapes=input_1:0:1x3x160x256 --optShapes=input_1:0:1x1x160x256 --maxShapes=input_1:0:1x3x160x256 --workspace=2048

or

trtexec --onnx=/tao/limbus_data/model.onnx --saveEngine=/tao/limbus_data/model_engine.trt --minShapes=x:1x3x160x256 --optShapes=x:1x1x160x256 --maxShapes=x:1x3x160x256 --workspace=2048

or

those commands without Shape end up similarly as below:

[10/22/2023-11:02:41] [W] --workspace flag has been deprecated by --memPoolSize flag.
[10/22/2023-11:02:41] [I] === Model Options ===
[10/22/2023-11:02:41] [I] Format: ONNX
[10/22/2023-11:02:41] [I] Model: /tao/limbus_data/model.onnx
[10/22/2023-11:02:41] [I] Output:
[10/22/2023-11:02:41] [I] === Build Options ===
[10/22/2023-11:02:41] [I] Max batch: explicit batch
[10/22/2023-11:02:41] [I] Memory Pools: workspace: 2048 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/22/2023-11:02:41] [I] minTiming: 1
[10/22/2023-11:02:41] [I] avgTiming: 8
[10/22/2023-11:02:41] [I] Precision: FP32
[10/22/2023-11:02:41] [I] LayerPrecisions:
[10/22/2023-11:02:41] [I] Layer Device Types:
[10/22/2023-11:02:41] [I] Calibration:
[10/22/2023-11:02:41] [I] Refit: Disabled
[10/22/2023-11:02:41] [I] Version Compatible: Disabled
[10/22/2023-11:02:41] [I] TensorRT runtime: full
[10/22/2023-11:02:41] [I] Lean DLL Path:
[10/22/2023-11:02:41] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/22/2023-11:02:41] [I] Exclude Lean Runtime: Disabled
[10/22/2023-11:02:41] [I] Sparsity: Disabled
[10/22/2023-11:02:41] [I] Safe mode: Disabled
[10/22/2023-11:02:41] [I] Build DLA standalone loadable: Disabled
[10/22/2023-11:02:41] [I] Allow GPU fallback for DLA: Disabled
[10/22/2023-11:02:41] [I] DirectIO mode: Disabled
[10/22/2023-11:02:41] [I] Restricted mode: Disabled
[10/22/2023-11:02:41] [I] Skip inference: Disabled
[10/22/2023-11:02:41] [I] Save engine: /tao/limbus_data/model_engine.trt
[10/22/2023-11:02:41] [I] Load engine:
[10/22/2023-11:02:41] [I] Profiling verbosity: 0
[10/22/2023-11:02:41] [I] Tactic sources: Using default tactic sources
[10/22/2023-11:02:41] [I] timingCacheMode: local
[10/22/2023-11:02:41] [I] timingCacheFile:
[10/22/2023-11:02:41] [I] Heuristic: Disabled
[10/22/2023-11:02:41] [I] Preview Features: Use default preview flags.
[10/22/2023-11:02:41] [I] MaxAuxStreams: -1
[10/22/2023-11:02:41] [I] BuilderOptimizationLevel: -1
[10/22/2023-11:02:41] [I] Input(s)s format: fp32:CHW
[10/22/2023-11:02:41] [I] Output(s)s format: fp32:CHW
[10/22/2023-11:02:41] [I] Input build shape: input_1:0=1x3x160x256+1x1x160x256+1x3x160x256
[10/22/2023-11:02:41] [I] Input calibration shapes: model
[10/22/2023-11:02:41] [I] === System Options ===
[10/22/2023-11:02:41] [I] Device: 0
[10/22/2023-11:02:41] [I] DLACore:
[10/22/2023-11:02:41] [I] Plugins:
[10/22/2023-11:02:41] [I] setPluginsToSerialize:
[10/22/2023-11:02:41] [I] dynamicPlugins:
[10/22/2023-11:02:41] [I] ignoreParsedPluginLibs: 0
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] === Inference Options ===
[10/22/2023-11:02:41] [I] Batch: Explicit
[10/22/2023-11:02:41] [I] Input inference shape: input_1:0=1x1x160x256
[10/22/2023-11:02:41] [I] Iterations: 10
[10/22/2023-11:02:41] [I] Duration: 3s (+ 200ms warm up)
[10/22/2023-11:02:41] [I] Sleep time: 0ms
[10/22/2023-11:02:41] [I] Idle time: 0ms
[10/22/2023-11:02:41] [I] Inference Streams: 1
[10/22/2023-11:02:41] [I] ExposeDMA: Disabled
[10/22/2023-11:02:41] [I] Data transfers: Enabled
[10/22/2023-11:02:41] [I] Spin-wait: Disabled
[10/22/2023-11:02:41] [I] Multithreading: Disabled
[10/22/2023-11:02:41] [I] CUDA Graph: Disabled
[10/22/2023-11:02:41] [I] Separate profiling: Disabled
[10/22/2023-11:02:41] [I] Time Deserialize: Disabled
[10/22/2023-11:02:41] [I] Time Refit: Disabled
[10/22/2023-11:02:41] [I] NVTX verbosity: 0
[10/22/2023-11:02:41] [I] Persistent Cache Ratio: 0
[10/22/2023-11:02:41] [I] Inputs:
[10/22/2023-11:02:41] [I] === Reporting Options ===
[10/22/2023-11:02:41] [I] Verbose: Disabled
[10/22/2023-11:02:41] [I] Averages: 10 inferences
[10/22/2023-11:02:41] [I] Percentiles: 90,95,99
[10/22/2023-11:02:41] [I] Dump refittable layers:Disabled
[10/22/2023-11:02:41] [I] Dump output: Disabled
[10/22/2023-11:02:41] [I] Profile: Disabled
[10/22/2023-11:02:41] [I] Export timing to JSON file:
[10/22/2023-11:02:41] [I] Export output to JSON file:
[10/22/2023-11:02:41] [I] Export profile to JSON file:
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] === Device Information ===
[10/22/2023-11:02:41] [I] Selected Device: NVIDIA GeForce RTX 2070
[10/22/2023-11:02:41] [I] Compute Capability: 7.5
[10/22/2023-11:02:41] [I] SMs: 36
[10/22/2023-11:02:41] [I] Device Global Memory: 7972 MiB
[10/22/2023-11:02:41] [I] Shared Memory per SM: 64 KiB
[10/22/2023-11:02:41] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/22/2023-11:02:41] [I] Application Compute Clock Rate: 1.44 GHz
[10/22/2023-11:02:41] [I] Application Memory Clock Rate: 7.001 GHz
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] TensorRT version: 8.6.1
[10/22/2023-11:02:41] [I] Loading standard plugins
[10/22/2023-11:02:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 467 (MiB)
[10/22/2023-11:02:43] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 423, GPU 543 (MiB)
[10/22/2023-11:02:43] [I] Start parsing network model.
[10/22/2023-11:02:43] [I] [TRT] ----------------------------------------------------------------
[10/22/2023-11:02:43] [I] [TRT] Input filename: /tao/limbus_data/model.onnx
[10/22/2023-11:02:43] [I] [TRT] ONNX IR version: 0.0.8
[10/22/2023-11:02:43] [I] [TRT] Opset version: 12
[10/22/2023-11:02:43] [I] [TRT] Producer name: keras2onnx
[10/22/2023-11:02:43] [I] [TRT] Producer version: 1.13.0
[10/22/2023-11:02:43] [I] [TRT] Domain:
[10/22/2023-11:02:43] [I] [TRT] Model version: 0
[10/22/2023-11:02:43] [I] [TRT] Doc string:
[10/22/2023-11:02:43] [I] [TRT] ----------------------------------------------------------------
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[10/22/2023-11:02:43] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[10/22/2023-11:02:43] [W] [TRT] builtin_op_importers.cpp:5245: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[10/22/2023-11:02:43] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[10/22/2023-11:02:43] [I] Finished parsing network model. Parse time: 0.24603
[10/22/2023-11:02:43] [E] Cannot find input tensor with name “input_1:0” in the network inputs! Please make sure the input tensor names are correct.
[10/22/2023-11:02:43] [E] Network And Config setup failed
[10/22/2023-11:02:43] [E] Building engine failed
[10/22/2023-11:02:43] [E] Failed to create engine from model or file.
[10/22/2023-11:02:43] [E] Engine set up failed

I need it actually on jetson orin nx, but i would also run it on deepstream withmy x86. with rtx2070
I have the following nvidia-smi:

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
±--------------------------------------------------------------------------------------+

My input tensor name is “Input”. As you suggested even I write the name of the input tensor into the shape commands I receive core dumped error

Attached I also send verbose log:

log_verbose.txt (183.3 KB)

What would your suggestion be?

Morganh · October 23, 2023, 2:47am

Seems that the input tensor is not correct. Could you share your onnx file?

Morganh · October 23, 2023, 2:54am

I think you already generated two kinds of onnx file. One is trained on detectnet_v2 network, another is yolo_v4_tiny network.

For running trtexec against different network models, please refer to Optimizing and Profiling with TensorRT - NVIDIA Docs
For example,

Detectnet_v2: TRTEXEC with DetectNet-v2 - NVIDIA Docs. I already share the commands in my previous comment.
YOLOv4_tiny: TRTEXEC with YOLO_v4_tiny - NVIDIA Docs

erence · October 23, 2023, 3:12pm

Correct I how detectnet_v2 and yolo_v4_tiny network onnx models. I can convert none.

for yolo_v4_tiny network, as said, I have a working onx file, which can infer and draw bboxs. For yolo the trtexec command is:

trtexec --onnx= /tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine= /tao/limbus_data/model.engine

it results in below log with core dumb error

root@12584098aeef:/tao# trtexec --onnx=/tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine=/tao/limbus_data/model.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=/tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine=/tao/limbus_data/model.engine
[10/23/2023-14:44:46] [I] === Model Options ===
[10/23/2023-14:44:46] [I] Format: ONNX
[10/23/2023-14:44:46] [I] Model: /tao/limbus_data/model.onnx
[10/23/2023-14:44:46] [I] Output:
[10/23/2023-14:44:46] [I] === Build Options ===
[10/23/2023-14:44:46] [I] Max batch: explicit batch
[10/23/2023-14:44:46] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/23/2023-14:44:46] [I] minTiming: 1
[10/23/2023-14:44:46] [I] avgTiming: 8
[10/23/2023-14:44:46] [I] Precision: FP32+FP16
[10/23/2023-14:44:46] [I] LayerPrecisions:
[10/23/2023-14:44:46] [I] Layer Device Types:
[10/23/2023-14:44:46] [I] Calibration:
[10/23/2023-14:44:46] [I] Refit: Disabled
[10/23/2023-14:44:46] [I] Version Compatible: Disabled
[10/23/2023-14:44:46] [I] TensorRT runtime: full
[10/23/2023-14:44:46] [I] Lean DLL Path:
[10/23/2023-14:44:46] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/23/2023-14:44:46] [I] Exclude Lean Runtime: Disabled
[10/23/2023-14:44:46] [I] Sparsity: Disabled
[10/23/2023-14:44:46] [I] Safe mode: Disabled
[10/23/2023-14:44:46] [I] Build DLA standalone loadable: Disabled
[10/23/2023-14:44:46] [I] Allow GPU fallback for DLA: Disabled
[10/23/2023-14:44:46] [I] DirectIO mode: Disabled
[10/23/2023-14:44:46] [I] Restricted mode: Disabled
[10/23/2023-14:44:46] [I] Skip inference: Disabled
[10/23/2023-14:44:46] [I] Save engine: /tao/limbus_data/model.engine
[10/23/2023-14:44:46] [I] Load engine:
[10/23/2023-14:44:46] [I] Profiling verbosity: 0
[10/23/2023-14:44:46] [I] Tactic sources: Using default tactic sources
[10/23/2023-14:44:46] [I] timingCacheMode: local
[10/23/2023-14:44:46] [I] timingCacheFile:
[10/23/2023-14:44:46] [I] Heuristic: Disabled
[10/23/2023-14:44:46] [I] Preview Features: Use default preview flags.
[10/23/2023-14:44:46] [I] MaxAuxStreams: -1
[10/23/2023-14:44:46] [I] BuilderOptimizationLevel: -1
[10/23/2023-14:44:46] [I] Input(s)s format: fp32:CHW
[10/23/2023-14:44:46] [I] Output(s)s format: fp32:CHW
[10/23/2023-14:44:46] [I] Input build shape: Input=1x3x160x256+8x3x160x256+16x3x160x256
[10/23/2023-14:44:46] [I] Input calibration shapes: model
[10/23/2023-14:44:46] [I] === System Options ===
[10/23/2023-14:44:46] [I] Device: 0
[10/23/2023-14:44:46] [I] DLACore:
[10/23/2023-14:44:46] [I] Plugins:
[10/23/2023-14:44:46] [I] setPluginsToSerialize:
[10/23/2023-14:44:46] [I] dynamicPlugins:
[10/23/2023-14:44:46] [I] ignoreParsedPluginLibs: 0
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] === Inference Options ===
[10/23/2023-14:44:46] [I] Batch: Explicit
[10/23/2023-14:44:46] [I] Input inference shape: Input=8x3x160x256
[10/23/2023-14:44:46] [I] Iterations: 10
[10/23/2023-14:44:46] [I] Duration: 3s (+ 200ms warm up)
[10/23/2023-14:44:46] [I] Sleep time: 0ms
[10/23/2023-14:44:46] [I] Idle time: 0ms
[10/23/2023-14:44:46] [I] Inference Streams: 1
[10/23/2023-14:44:46] [I] ExposeDMA: Disabled
[10/23/2023-14:44:46] [I] Data transfers: Enabled
[10/23/2023-14:44:46] [I] Spin-wait: Disabled
[10/23/2023-14:44:46] [I] Multithreading: Disabled
[10/23/2023-14:44:46] [I] CUDA Graph: Disabled
[10/23/2023-14:44:46] [I] Separate profiling: Disabled
[10/23/2023-14:44:46] [I] Time Deserialize: Disabled
[10/23/2023-14:44:46] [I] Time Refit: Disabled
[10/23/2023-14:44:46] [I] NVTX verbosity: 0
[10/23/2023-14:44:46] [I] Persistent Cache Ratio: 0
[10/23/2023-14:44:46] [I] Inputs:
[10/23/2023-14:44:46] [I] === Reporting Options ===
[10/23/2023-14:44:46] [I] Verbose: Disabled
[10/23/2023-14:44:46] [I] Averages: 10 inferences
[10/23/2023-14:44:46] [I] Percentiles: 90,95,99
[10/23/2023-14:44:46] [I] Dump refittable layers:Disabled
[10/23/2023-14:44:46] [I] Dump output: Disabled
[10/23/2023-14:44:46] [I] Profile: Disabled
[10/23/2023-14:44:46] [I] Export timing to JSON file:
[10/23/2023-14:44:46] [I] Export output to JSON file:
[10/23/2023-14:44:46] [I] Export profile to JSON file:
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] === Device Information ===
[10/23/2023-14:44:46] [I] Selected Device: NVIDIA GeForce RTX 2070
[10/23/2023-14:44:46] [I] Compute Capability: 7.5
[10/23/2023-14:44:46] [I] SMs: 36
[10/23/2023-14:44:46] [I] Device Global Memory: 7972 MiB
[10/23/2023-14:44:46] [I] Shared Memory per SM: 64 KiB
[10/23/2023-14:44:46] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/23/2023-14:44:46] [I] Application Compute Clock Rate: 1.44 GHz
[10/23/2023-14:44:46] [I] Application Memory Clock Rate: 7.001 GHz
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] TensorRT version: 8.6.1
[10/23/2023-14:44:46] [I] Loading standard plugins
[10/23/2023-14:44:46] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 614 (MiB)
[10/23/2023-14:44:48] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 423, GPU 690 (MiB)
[10/23/2023-14:44:48] [I] Start parsing network model.
[10/23/2023-14:44:48] [I] [TRT] ----------------------------------------------------------------
[10/23/2023-14:44:48] [I] [TRT] Input filename: /tao/limbus_data/model.onnx
[10/23/2023-14:44:48] [I] [TRT] ONNX IR version: 0.0.8
[10/23/2023-14:44:48] [I] [TRT] Opset version: 12
[10/23/2023-14:44:48] [I] [TRT] Producer name: keras2onnx
[10/23/2023-14:44:48] [I] [TRT] Producer version: 1.13.0
[10/23/2023-14:44:48] [I] [TRT] Domain:
[10/23/2023-14:44:48] [I] [TRT] Model version: 0
[10/23/2023-14:44:48] [I] [TRT] Doc string:
[10/23/2023-14:44:48] [I] [TRT] ----------------------------------------------------------------
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[10/23/2023-14:44:48] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[10/23/2023-14:44:48] [W] [TRT] builtin_op_importers.cpp:5245: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[10/23/2023-14:44:48] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[10/23/2023-14:44:48] [I] Finished parsing network model. Parse time: 0.247098
Segmentation fault (core dumped)

Should I try it outside the docker?

I tried it after asking you— it worked:) . Lesson learned: Dont run trtexec in docker! I think it tries to go into a second docker or somethonk similar… (And try it before asking) Thank you very much it is resolved for now

Morganh · October 23, 2023, 4:02pm

The trtexec should also work in TAO docker.

erence · October 23, 2023, 4:48pm

should it then be run as ./trtexec or trtexec?

Morganh · October 24, 2023, 1:43am

There is /usr/src/tensorrt/bin/trtexec by default inside the docker. You can use it directly.

erence · October 30, 2023, 11:45am

Thanks Trtexec works also outside docker , I had asetup error. Reflashing solved the problem

system · November 13, 2023, 11:45am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying to convert Yolov8.onnx into trt ( TensorRT version : 8.2, jetson-jetpack : 4.6.1) Jetson Xavier NX tensorrt , cuda , yolo	12	3443	May 17, 2023
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1448	October 5, 2023
Model onnx trt engine generation process report different results compared between PC and jetson XAVIER NX Jetson Xavier NX tensorrt	19	1024	September 28, 2022
Using Custom action recognition Model in Deepstream 3D action recognition and Getting Error TAO Toolkit	70	933	December 12, 2023
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	744	April 30, 2024
About trtexec Jetson Nano tensorrt	2	3416	October 15, 2021
Trtexec deployable onnx to engine for pointpillar object detection TensorRT cudnn	0	15	February 1, 2025
I am trying to convert the ONNX SSD mobilnet v3 model into TensorRT Engine. I am getting the below error Jetson TX2 tensorrt , tensorflow	24	3717	February 17, 2022
Floor - Cast - Resize(or Slice) cause internal error TensorRT tensorrt	7	2024	January 12, 2022
TAO 5.0 Classification (PyTorch) deploy error TAO Toolkit	49	1457	September 11, 2023

Trtexec convert onnx to engine fails

Attached I also send verbose log:

trtexec --onnx= /tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine= /tao/limbus_data/model.engine

Related topics