Trtexec convert onnx to engine fails

Please provide the following information when requesting support.

• Hardware (RTX2700)
• Network Type (Detectnet_v2)
• TLT Version (nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5)
• Training spec file(
detectnet_train_cfg3.txt (3.5 KB)
)
• How to reproduce the issue ?

trtexec --onnx=/tao/eyestrab_detectnet_resnet18.onnx --saveEngine=/tao/resnet_engine_fp16.trt --fp16 --workspace=8000 --shapes=data:1x3x1920x1200

and a few variations of this command fail as:

-Engine set up failed
or
-Dynamic dimensions required for input: input_1:0, but no shapes were provided. Automatically overriding shape to: 1x3x1920x1200

Help appreciated,
Best regards.

Refer to TRTEXEC with DetectNet-v2 - NVIDIA Docs.
For fp16, you can run

trtexec --onnx=/path/to/model.onnx \
        --maxShapes="input_1:0":16x3x544x960 \
        --minShapes="input_1:0":1x3x544x960 \
        --optShapes="input_1:0":8x3x544x960 \
        --fp16 \
        --saveEngine=/path/to/save/trt/model.engine

The 544x960 can be modified to the actual heightxwidith of your model.
Also, the batch-size can also be changed. For example, 8x3x544x960 changes to 1x3x544x960

I end up in segmentation error. How can I debug?

Please share the full log. Thanks.

I am using this command on nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 container

&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=/tao/eyestrab_detectnet_resnet18.onnx --saveEngine=/tao/resnet_engine_fp16.trt --fp16 --workspace=8 --shapes=data:1x1920x1200 --explicitBatch

Log:

[10/20/2023-16:16:19] [W] --explicitBatch flag has been deprecated and has no effect!
[10/20/2023-16:16:19] [W] Explicit batch dim is automatically enabled if input model is ONNX or if dynamic shapes are provided when the engine is built.
[10/20/2023-16:16:19] [W] --workspace flag has been deprecated by --memPoolSize flag.
[10/20/2023-16:16:19] [I] === Model Options ===
[10/20/2023-16:16:19] [I] Format: ONNX
[10/20/2023-16:16:19] [I] Model: /tao/eyestrab_detectnet_resnet18.onnx
[10/20/2023-16:16:19] [I] Output:
[10/20/2023-16:16:19] [I] === Build Options ===
[10/20/2023-16:16:19] [I] Max batch: explicit batch
[10/20/2023-16:16:19] [I] Memory Pools: workspace: 8 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/20/2023-16:16:19] [I] minTiming: 1
[10/20/2023-16:16:19] [I] avgTiming: 8
[10/20/2023-16:16:19] [I] Precision: FP32+FP16
[10/20/2023-16:16:19] [I] LayerPrecisions:
[10/20/2023-16:16:19] [I] Layer Device Types:
[10/20/2023-16:16:19] [I] Calibration:
[10/20/2023-16:16:19] [I] Refit: Disabled
[10/20/2023-16:16:19] [I] Version Compatible: Disabled
[10/20/2023-16:16:19] [I] TensorRT runtime: full
[10/20/2023-16:16:19] [I] Lean DLL Path:
[10/20/2023-16:16:19] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/20/2023-16:16:19] [I] Exclude Lean Runtime: Disabled
[10/20/2023-16:16:19] [I] Sparsity: Disabled
[10/20/2023-16:16:19] [I] Safe mode: Disabled
[10/20/2023-16:16:19] [I] Build DLA standalone loadable: Disabled
[10/20/2023-16:16:19] [I] Allow GPU fallback for DLA: Disabled
[10/20/2023-16:16:19] [I] DirectIO mode: Disabled
[10/20/2023-16:16:19] [I] Restricted mode: Disabled
[10/20/2023-16:16:19] [I] Skip inference: Disabled
[10/20/2023-16:16:19] [I] Save engine: /tao/resnet_engine_fp16.trt
[10/20/2023-16:16:19] [I] Load engine:
[10/20/2023-16:16:19] [I] Profiling verbosity: 0
[10/20/2023-16:16:19] [I] Tactic sources: Using default tactic sources
[10/20/2023-16:16:19] [I] timingCacheMode: local
[10/20/2023-16:16:19] [I] timingCacheFile:
[10/20/2023-16:16:19] [I] Heuristic: Disabled
[10/20/2023-16:16:19] [I] Preview Features: Use default preview flags.
[10/20/2023-16:16:19] [I] MaxAuxStreams: -1
[10/20/2023-16:16:19] [I] BuilderOptimizationLevel: -1
[10/20/2023-16:16:19] [I] Input(s)s format: fp32:CHW
[10/20/2023-16:16:19] [I] Output(s)s format: fp32:CHW
[10/20/2023-16:16:19] [I] Input build shapes: model
[10/20/2023-16:16:19] [I] Input calibration shapes: model
[10/20/2023-16:16:19] [I] === System Options ===
[10/20/2023-16:16:19] [I] Device: 0
[10/20/2023-16:16:19] [I] DLACore:
[10/20/2023-16:16:19] [I] Plugins:
[10/20/2023-16:16:19] [I] setPluginsToSerialize:
[10/20/2023-16:16:19] [I] dynamicPlugins:
[10/20/2023-16:16:19] [I] ignoreParsedPluginLibs: 0
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] === Inference Options ===
[10/20/2023-16:16:19] [I] Batch: Explicit
[10/20/2023-16:16:19] [I] Input inference shapes: model
[10/20/2023-16:16:19] [I] Iterations: 10
[10/20/2023-16:16:19] [I] Duration: 3s (+ 200ms warm up)
[10/20/2023-16:16:19] [I] Sleep time: 0ms
[10/20/2023-16:16:19] [I] Idle time: 0ms
[10/20/2023-16:16:19] [I] Inference Streams: 1
[10/20/2023-16:16:19] [I] ExposeDMA: Disabled
[10/20/2023-16:16:19] [I] Data transfers: Enabled
[10/20/2023-16:16:19] [I] Spin-wait: Disabled
[10/20/2023-16:16:19] [I] Multithreading: Disabled
[10/20/2023-16:16:19] [I] CUDA Graph: Disabled
[10/20/2023-16:16:19] [I] Separate profiling: Disabled
[10/20/2023-16:16:19] [I] Time Deserialize: Disabled
[10/20/2023-16:16:19] [I] Time Refit: Disabled
[10/20/2023-16:16:19] [I] NVTX verbosity: 0
[10/20/2023-16:16:19] [I] Persistent Cache Ratio: 0
[10/20/2023-16:16:19] [I] Inputs:
[10/20/2023-16:16:19] [I] === Reporting Options ===
[10/20/2023-16:16:19] [I] Verbose: Disabled
[10/20/2023-16:16:19] [I] Averages: 10 inferences
[10/20/2023-16:16:19] [I] Percentiles: 90,95,99
[10/20/2023-16:16:19] [I] Dump refittable layers:Disabled
[10/20/2023-16:16:19] [I] Dump output: Disabled
[10/20/2023-16:16:19] [I] Profile: Disabled
[10/20/2023-16:16:19] [I] Export timing to JSON file:
[10/20/2023-16:16:19] [I] Export output to JSON file:
[10/20/2023-16:16:19] [I] Export profile to JSON file:
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] === Device Information ===
[10/20/2023-16:16:19] [I] Selected Device: NVIDIA GeForce RTX 2070
[10/20/2023-16:16:19] [I] Compute Capability: 7.5
[10/20/2023-16:16:19] [I] SMs: 36
[10/20/2023-16:16:19] [I] Device Global Memory: 7972 MiB
[10/20/2023-16:16:19] [I] Shared Memory per SM: 64 KiB
[10/20/2023-16:16:19] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/20/2023-16:16:19] [I] Application Compute Clock Rate: 1.44 GHz
[10/20/2023-16:16:19] [I] Application Memory Clock Rate: 7.001 GHz
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/20/2023-16:16:19] [I]
[10/20/2023-16:16:19] [I] TensorRT version: 8.6.1
[10/20/2023-16:16:19] [I] Loading standard plugins
[10/20/2023-16:16:19] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 538 (MiB)
[10/20/2023-16:16:21] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 423, GPU 614 (MiB)
[10/20/2023-16:16:21] [I] Start parsing network model.
[10/20/2023-16:16:21] [I] [TRT] ----------------------------------------------------------------
[10/20/2023-16:16:21] [I] [TRT] Input filename: /tao/eyestrab_detectnet_resnet18.onnx
[10/20/2023-16:16:21] [I] [TRT] ONNX IR version: 0.0.7
[10/20/2023-16:16:21] [I] [TRT] Opset version: 12
[10/20/2023-16:16:21] [I] [TRT] Producer name: tf2onnx
[10/20/2023-16:16:21] [I] [TRT] Producer version: 1.9.2
[10/20/2023-16:16:21] [I] [TRT] Domain:
[10/20/2023-16:16:21] [I] [TRT] Model version: 0
[10/20/2023-16:16:21] [I] [TRT] Doc string:
[10/20/2023-16:16:21] [I] [TRT] ----------------------------------------------------------------
[10/20/2023-16:16:21] [I] Finished parsing network model. Parse time: 0.111052
[10/20/2023-16:16:21] [W] Dynamic dimensions required for input: input_1:0, but no shapes were provided. Automatically overriding shape to: 1x3x1920x1200
Segmentation fault (core dumped)

I have used first yolo annotation format, converted it to kitti (nvidia 15) then created tfrecords. Might it be because of the beginning tool? What would your favorite annotation tool for tao and kitti generally?

Could you retry as below?

trtexec --onnx=/path/to/model.onnx \
        --maxShapes="input_1:0":1x3x1200x1900 \
        --minShapes="input_1:0":1x3x1200x1900 \
        --optShapes="input_1:0":1x3x1200x1900 \
        --fp16 \
        --saveEngine=/path/to/save/trt/model.engine

I assume the input of your model is 3x1900x1200 (channel * width * height)

I have created a working yolo_v4_tiny model. It can infere with tao infere command. But the problem with trtexec remains the same. The new model has the following retrain spec.
yolo_v4_tiny_retrain_kitti_seq.txt (1.9 KB)
It has a width of 256 and height 160. I tried a smaller model.

I run on nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 container.

trtexec --onnx=/tao/limbus_data/model.onnx --saveEngine=/tao/limbus_data/model_engine.trt --minShapes=input_1:0:1x3x160x256 --optShapes=input_1:0:1x1x160x256 --maxShapes=input_1:0:1x3x160x256 --workspace=2048

or

trtexec --onnx=/tao/limbus_data/model.onnx --saveEngine=/tao/limbus_data/model_engine.trt --minShapes=x:1x3x160x256 --optShapes=x:1x1x160x256 --maxShapes=x:1x3x160x256 --workspace=2048

or

those commands without Shape end up similarly as below:

[10/22/2023-11:02:41] [W] --workspace flag has been deprecated by --memPoolSize flag.
[10/22/2023-11:02:41] [I] === Model Options ===
[10/22/2023-11:02:41] [I] Format: ONNX
[10/22/2023-11:02:41] [I] Model: /tao/limbus_data/model.onnx
[10/22/2023-11:02:41] [I] Output:
[10/22/2023-11:02:41] [I] === Build Options ===
[10/22/2023-11:02:41] [I] Max batch: explicit batch
[10/22/2023-11:02:41] [I] Memory Pools: workspace: 2048 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/22/2023-11:02:41] [I] minTiming: 1
[10/22/2023-11:02:41] [I] avgTiming: 8
[10/22/2023-11:02:41] [I] Precision: FP32
[10/22/2023-11:02:41] [I] LayerPrecisions:
[10/22/2023-11:02:41] [I] Layer Device Types:
[10/22/2023-11:02:41] [I] Calibration:
[10/22/2023-11:02:41] [I] Refit: Disabled
[10/22/2023-11:02:41] [I] Version Compatible: Disabled
[10/22/2023-11:02:41] [I] TensorRT runtime: full
[10/22/2023-11:02:41] [I] Lean DLL Path:
[10/22/2023-11:02:41] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/22/2023-11:02:41] [I] Exclude Lean Runtime: Disabled
[10/22/2023-11:02:41] [I] Sparsity: Disabled
[10/22/2023-11:02:41] [I] Safe mode: Disabled
[10/22/2023-11:02:41] [I] Build DLA standalone loadable: Disabled
[10/22/2023-11:02:41] [I] Allow GPU fallback for DLA: Disabled
[10/22/2023-11:02:41] [I] DirectIO mode: Disabled
[10/22/2023-11:02:41] [I] Restricted mode: Disabled
[10/22/2023-11:02:41] [I] Skip inference: Disabled
[10/22/2023-11:02:41] [I] Save engine: /tao/limbus_data/model_engine.trt
[10/22/2023-11:02:41] [I] Load engine:
[10/22/2023-11:02:41] [I] Profiling verbosity: 0
[10/22/2023-11:02:41] [I] Tactic sources: Using default tactic sources
[10/22/2023-11:02:41] [I] timingCacheMode: local
[10/22/2023-11:02:41] [I] timingCacheFile:
[10/22/2023-11:02:41] [I] Heuristic: Disabled
[10/22/2023-11:02:41] [I] Preview Features: Use default preview flags.
[10/22/2023-11:02:41] [I] MaxAuxStreams: -1
[10/22/2023-11:02:41] [I] BuilderOptimizationLevel: -1
[10/22/2023-11:02:41] [I] Input(s)s format: fp32:CHW
[10/22/2023-11:02:41] [I] Output(s)s format: fp32:CHW
[10/22/2023-11:02:41] [I] Input build shape: input_1:0=1x3x160x256+1x1x160x256+1x3x160x256
[10/22/2023-11:02:41] [I] Input calibration shapes: model
[10/22/2023-11:02:41] [I] === System Options ===
[10/22/2023-11:02:41] [I] Device: 0
[10/22/2023-11:02:41] [I] DLACore:
[10/22/2023-11:02:41] [I] Plugins:
[10/22/2023-11:02:41] [I] setPluginsToSerialize:
[10/22/2023-11:02:41] [I] dynamicPlugins:
[10/22/2023-11:02:41] [I] ignoreParsedPluginLibs: 0
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] === Inference Options ===
[10/22/2023-11:02:41] [I] Batch: Explicit
[10/22/2023-11:02:41] [I] Input inference shape: input_1:0=1x1x160x256
[10/22/2023-11:02:41] [I] Iterations: 10
[10/22/2023-11:02:41] [I] Duration: 3s (+ 200ms warm up)
[10/22/2023-11:02:41] [I] Sleep time: 0ms
[10/22/2023-11:02:41] [I] Idle time: 0ms
[10/22/2023-11:02:41] [I] Inference Streams: 1
[10/22/2023-11:02:41] [I] ExposeDMA: Disabled
[10/22/2023-11:02:41] [I] Data transfers: Enabled
[10/22/2023-11:02:41] [I] Spin-wait: Disabled
[10/22/2023-11:02:41] [I] Multithreading: Disabled
[10/22/2023-11:02:41] [I] CUDA Graph: Disabled
[10/22/2023-11:02:41] [I] Separate profiling: Disabled
[10/22/2023-11:02:41] [I] Time Deserialize: Disabled
[10/22/2023-11:02:41] [I] Time Refit: Disabled
[10/22/2023-11:02:41] [I] NVTX verbosity: 0
[10/22/2023-11:02:41] [I] Persistent Cache Ratio: 0
[10/22/2023-11:02:41] [I] Inputs:
[10/22/2023-11:02:41] [I] === Reporting Options ===
[10/22/2023-11:02:41] [I] Verbose: Disabled
[10/22/2023-11:02:41] [I] Averages: 10 inferences
[10/22/2023-11:02:41] [I] Percentiles: 90,95,99
[10/22/2023-11:02:41] [I] Dump refittable layers:Disabled
[10/22/2023-11:02:41] [I] Dump output: Disabled
[10/22/2023-11:02:41] [I] Profile: Disabled
[10/22/2023-11:02:41] [I] Export timing to JSON file:
[10/22/2023-11:02:41] [I] Export output to JSON file:
[10/22/2023-11:02:41] [I] Export profile to JSON file:
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] === Device Information ===
[10/22/2023-11:02:41] [I] Selected Device: NVIDIA GeForce RTX 2070
[10/22/2023-11:02:41] [I] Compute Capability: 7.5
[10/22/2023-11:02:41] [I] SMs: 36
[10/22/2023-11:02:41] [I] Device Global Memory: 7972 MiB
[10/22/2023-11:02:41] [I] Shared Memory per SM: 64 KiB
[10/22/2023-11:02:41] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/22/2023-11:02:41] [I] Application Compute Clock Rate: 1.44 GHz
[10/22/2023-11:02:41] [I] Application Memory Clock Rate: 7.001 GHz
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/22/2023-11:02:41] [I]
[10/22/2023-11:02:41] [I] TensorRT version: 8.6.1
[10/22/2023-11:02:41] [I] Loading standard plugins
[10/22/2023-11:02:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 467 (MiB)
[10/22/2023-11:02:43] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 423, GPU 543 (MiB)
[10/22/2023-11:02:43] [I] Start parsing network model.
[10/22/2023-11:02:43] [I] [TRT] ----------------------------------------------------------------
[10/22/2023-11:02:43] [I] [TRT] Input filename: /tao/limbus_data/model.onnx
[10/22/2023-11:02:43] [I] [TRT] ONNX IR version: 0.0.8
[10/22/2023-11:02:43] [I] [TRT] Opset version: 12
[10/22/2023-11:02:43] [I] [TRT] Producer name: keras2onnx
[10/22/2023-11:02:43] [I] [TRT] Producer version: 1.13.0
[10/22/2023-11:02:43] [I] [TRT] Domain:
[10/22/2023-11:02:43] [I] [TRT] Model version: 0
[10/22/2023-11:02:43] [I] [TRT] Doc string:
[10/22/2023-11:02:43] [I] [TRT] ----------------------------------------------------------------
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/22/2023-11:02:43] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[10/22/2023-11:02:43] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[10/22/2023-11:02:43] [W] [TRT] builtin_op_importers.cpp:5245: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[10/22/2023-11:02:43] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[10/22/2023-11:02:43] [I] Finished parsing network model. Parse time: 0.24603
[10/22/2023-11:02:43] [E] Cannot find input tensor with name “input_1:0” in the network inputs! Please make sure the input tensor names are correct.
[10/22/2023-11:02:43] [E] Network And Config setup failed
[10/22/2023-11:02:43] [E] Building engine failed
[10/22/2023-11:02:43] [E] Failed to create engine from model or file.
[10/22/2023-11:02:43] [E] Engine set up failed

I need it actually on jetson orin nx, but i would also run it on deepstream withmy x86. with rtx2070
I have the following nvidia-smi:

±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2070 On | 00000000:01:00.0 On | N/A |
| N/A 46C P8 12W / 115W | 369MiB / 8192MiB | 1% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
±--------------------------------------------------------------------------------------+

My input tensor name is “Input”. As you suggested even I write the name of the input tensor into the shape commands I receive core dumped error

Attached I also send verbose log:

log_verbose.txt (183.3 KB)

What would your suggestion be?

Seems that the input tensor is not correct. Could you share your onnx file?

I think you already generated two kinds of onnx file. One is trained on detectnet_v2 network, another is yolo_v4_tiny network.

For running trtexec against different network models, please refer to Optimizing and Profiling with TensorRT - NVIDIA Docs
For example,

Correct I how detectnet_v2 and yolo_v4_tiny network onnx models. I can convert none.

for yolo_v4_tiny network, as said, I have a working onx file, which can infer and draw bboxs. For yolo the trtexec command is:

trtexec --onnx= /tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine= /tao/limbus_data/model.engine

it results in below log with core dumb error

root@12584098aeef:/tao# trtexec --onnx=/tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine=/tao/limbus_data/model.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=/tao/limbus_data/model.onnx --maxShapes=Input:16x3x160x256 --minShapes=Input:1x3x160x256 --optShapes=Input:8x3x160x256 --fp16 --saveEngine=/tao/limbus_data/model.engine
[10/23/2023-14:44:46] [I] === Model Options ===
[10/23/2023-14:44:46] [I] Format: ONNX
[10/23/2023-14:44:46] [I] Model: /tao/limbus_data/model.onnx
[10/23/2023-14:44:46] [I] Output:
[10/23/2023-14:44:46] [I] === Build Options ===
[10/23/2023-14:44:46] [I] Max batch: explicit batch
[10/23/2023-14:44:46] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/23/2023-14:44:46] [I] minTiming: 1
[10/23/2023-14:44:46] [I] avgTiming: 8
[10/23/2023-14:44:46] [I] Precision: FP32+FP16
[10/23/2023-14:44:46] [I] LayerPrecisions:
[10/23/2023-14:44:46] [I] Layer Device Types:
[10/23/2023-14:44:46] [I] Calibration:
[10/23/2023-14:44:46] [I] Refit: Disabled
[10/23/2023-14:44:46] [I] Version Compatible: Disabled
[10/23/2023-14:44:46] [I] TensorRT runtime: full
[10/23/2023-14:44:46] [I] Lean DLL Path:
[10/23/2023-14:44:46] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[10/23/2023-14:44:46] [I] Exclude Lean Runtime: Disabled
[10/23/2023-14:44:46] [I] Sparsity: Disabled
[10/23/2023-14:44:46] [I] Safe mode: Disabled
[10/23/2023-14:44:46] [I] Build DLA standalone loadable: Disabled
[10/23/2023-14:44:46] [I] Allow GPU fallback for DLA: Disabled
[10/23/2023-14:44:46] [I] DirectIO mode: Disabled
[10/23/2023-14:44:46] [I] Restricted mode: Disabled
[10/23/2023-14:44:46] [I] Skip inference: Disabled
[10/23/2023-14:44:46] [I] Save engine: /tao/limbus_data/model.engine
[10/23/2023-14:44:46] [I] Load engine:
[10/23/2023-14:44:46] [I] Profiling verbosity: 0
[10/23/2023-14:44:46] [I] Tactic sources: Using default tactic sources
[10/23/2023-14:44:46] [I] timingCacheMode: local
[10/23/2023-14:44:46] [I] timingCacheFile:
[10/23/2023-14:44:46] [I] Heuristic: Disabled
[10/23/2023-14:44:46] [I] Preview Features: Use default preview flags.
[10/23/2023-14:44:46] [I] MaxAuxStreams: -1
[10/23/2023-14:44:46] [I] BuilderOptimizationLevel: -1
[10/23/2023-14:44:46] [I] Input(s)s format: fp32:CHW
[10/23/2023-14:44:46] [I] Output(s)s format: fp32:CHW
[10/23/2023-14:44:46] [I] Input build shape: Input=1x3x160x256+8x3x160x256+16x3x160x256
[10/23/2023-14:44:46] [I] Input calibration shapes: model
[10/23/2023-14:44:46] [I] === System Options ===
[10/23/2023-14:44:46] [I] Device: 0
[10/23/2023-14:44:46] [I] DLACore:
[10/23/2023-14:44:46] [I] Plugins:
[10/23/2023-14:44:46] [I] setPluginsToSerialize:
[10/23/2023-14:44:46] [I] dynamicPlugins:
[10/23/2023-14:44:46] [I] ignoreParsedPluginLibs: 0
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] === Inference Options ===
[10/23/2023-14:44:46] [I] Batch: Explicit
[10/23/2023-14:44:46] [I] Input inference shape: Input=8x3x160x256
[10/23/2023-14:44:46] [I] Iterations: 10
[10/23/2023-14:44:46] [I] Duration: 3s (+ 200ms warm up)
[10/23/2023-14:44:46] [I] Sleep time: 0ms
[10/23/2023-14:44:46] [I] Idle time: 0ms
[10/23/2023-14:44:46] [I] Inference Streams: 1
[10/23/2023-14:44:46] [I] ExposeDMA: Disabled
[10/23/2023-14:44:46] [I] Data transfers: Enabled
[10/23/2023-14:44:46] [I] Spin-wait: Disabled
[10/23/2023-14:44:46] [I] Multithreading: Disabled
[10/23/2023-14:44:46] [I] CUDA Graph: Disabled
[10/23/2023-14:44:46] [I] Separate profiling: Disabled
[10/23/2023-14:44:46] [I] Time Deserialize: Disabled
[10/23/2023-14:44:46] [I] Time Refit: Disabled
[10/23/2023-14:44:46] [I] NVTX verbosity: 0
[10/23/2023-14:44:46] [I] Persistent Cache Ratio: 0
[10/23/2023-14:44:46] [I] Inputs:
[10/23/2023-14:44:46] [I] === Reporting Options ===
[10/23/2023-14:44:46] [I] Verbose: Disabled
[10/23/2023-14:44:46] [I] Averages: 10 inferences
[10/23/2023-14:44:46] [I] Percentiles: 90,95,99
[10/23/2023-14:44:46] [I] Dump refittable layers:Disabled
[10/23/2023-14:44:46] [I] Dump output: Disabled
[10/23/2023-14:44:46] [I] Profile: Disabled
[10/23/2023-14:44:46] [I] Export timing to JSON file:
[10/23/2023-14:44:46] [I] Export output to JSON file:
[10/23/2023-14:44:46] [I] Export profile to JSON file:
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] === Device Information ===
[10/23/2023-14:44:46] [I] Selected Device: NVIDIA GeForce RTX 2070
[10/23/2023-14:44:46] [I] Compute Capability: 7.5
[10/23/2023-14:44:46] [I] SMs: 36
[10/23/2023-14:44:46] [I] Device Global Memory: 7972 MiB
[10/23/2023-14:44:46] [I] Shared Memory per SM: 64 KiB
[10/23/2023-14:44:46] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/23/2023-14:44:46] [I] Application Compute Clock Rate: 1.44 GHz
[10/23/2023-14:44:46] [I] Application Memory Clock Rate: 7.001 GHz
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[10/23/2023-14:44:46] [I]
[10/23/2023-14:44:46] [I] TensorRT version: 8.6.1
[10/23/2023-14:44:46] [I] Loading standard plugins
[10/23/2023-14:44:46] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 26, GPU 614 (MiB)
[10/23/2023-14:44:48] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 423, GPU 690 (MiB)
[10/23/2023-14:44:48] [I] Start parsing network model.
[10/23/2023-14:44:48] [I] [TRT] ----------------------------------------------------------------
[10/23/2023-14:44:48] [I] [TRT] Input filename: /tao/limbus_data/model.onnx
[10/23/2023-14:44:48] [I] [TRT] ONNX IR version: 0.0.8
[10/23/2023-14:44:48] [I] [TRT] Opset version: 12
[10/23/2023-14:44:48] [I] [TRT] Producer name: keras2onnx
[10/23/2023-14:44:48] [I] [TRT] Producer version: 1.13.0
[10/23/2023-14:44:48] [I] [TRT] Domain:
[10/23/2023-14:44:48] [I] [TRT] Model version: 0
[10/23/2023-14:44:48] [I] [TRT] Doc string:
[10/23/2023-14:44:48] [I] [TRT] ----------------------------------------------------------------
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[10/23/2023-14:44:48] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[10/23/2023-14:44:48] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[10/23/2023-14:44:48] [W] [TRT] builtin_op_importers.cpp:5245: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[10/23/2023-14:44:48] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[10/23/2023-14:44:48] [I] Finished parsing network model. Parse time: 0.247098
Segmentation fault (core dumped)

Should I try it outside the docker?

I tried it after asking you— it worked:) . Lesson learned: Dont run trtexec in docker! I think it tries to go into a second docker or somethonk similar… (And try it before asking) Thank you very much it is resolved for now

The trtexec should also work in TAO docker.

should it then be run as ./trtexec or trtexec?

There is /usr/src/tensorrt/bin/trtexec by default inside the docker. You can use it directly.

Thanks Trtexec works also outside docker , I had asetup error. Reflashing solved the problem

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.