Exporting model to onnx using "tao model segformer export"

atcnthn · August 22, 2023, 1:30pm

When I’m trying to use the ONNX model (generated using the export command) to generate a TensorRT engine, I encounter the following error:

My trtexec command:

/usr/src/tensorrt/bin/trtexec --onnx=/workspace/tao-experiments/SegFormer_TAO/SegFormer_newest_480x640.onnx --saveEngine=SegFormer_*engine.trt --minShapes=input:1x3x480x640 --optShapes=input:1x3x480x640 --maxShapes=input:1x3x480x640 --fp16 --workspace=2048

The error:

[08/22/2023-13:24:02] [E] Error[4]: [network.cpp::validate::3100] Error Code 4: Internal Error (input: for dimension number 2 in profile 0 does not match network definition (got min=480, opt=480, max=480), expected min=opt=max=1024).)

The tao export command seems to generate the ONNX file with the input dimensions as 1x3x1024x1024 by default. That’s why the trtexec operation doesn’t accept any different optimization profiles.

How can I customize the input dimensions during the generation of the ONNX file via export command? (I couldn’t see any description about this regarding the experiment spec file or command arguments.)

Thank you in advance.

Morganh · August 23, 2023, 1:55am

Could you upload the full log via below button ?

Also, can you share the training spec file when you train the segformer model?

atcnthn · August 23, 2023, 5:30am

Training spec file:
config_train.yaml (2.3 KB)

Export spec file:
config_export.yaml (1.8 KB)

trtexec log:

&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # /usr/src/tensorrt/bin/trtexec --onnx=/workspace/tao-experiments/SegFormer_TAO/SegFormer_newest_480x640.onnx --saveEngine=SegFormer_*engine.trt --minShapes=input:1x3x480x640 --optShapes=input:1x3x480x640 --maxShapes=input:1x3x480x640 --fp16 --workspace=2048
[08/23/2023-05:25:29] [W] --workspace flag has been deprecated by --memPoolSize flag.
[08/23/2023-05:25:29] [I] === Model Options ===
[08/23/2023-05:25:29] [I] Format: ONNX
[08/23/2023-05:25:29] [I] Model: /workspace/tao-experiments/SegFormer_TAO/SegFormer_newest_480x640.onnx
[08/23/2023-05:25:29] [I] Output:
[08/23/2023-05:25:29] [I] === Build Options ===
[08/23/2023-05:25:29] [I] Max batch: explicit batch
[08/23/2023-05:25:29] [I] Memory Pools: workspace: 2048 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/23/2023-05:25:29] [I] minTiming: 1
[08/23/2023-05:25:29] [I] avgTiming: 8
[08/23/2023-05:25:29] [I] Precision: FP32+FP16
[08/23/2023-05:25:29] [I] LayerPrecisions:
[08/23/2023-05:25:29] [I] Calibration:
[08/23/2023-05:25:29] [I] Refit: Disabled
[08/23/2023-05:25:29] [I] Sparsity: Disabled
[08/23/2023-05:25:29] [I] Safe mode: Disabled
[08/23/2023-05:25:29] [I] DirectIO mode: Disabled
[08/23/2023-05:25:29] [I] Restricted mode: Disabled
[08/23/2023-05:25:29] [I] Build only: Disabled
[08/23/2023-05:25:29] [I] Save engine: SegFormer_*engine.trt
[08/23/2023-05:25:29] [I] Load engine:
[08/23/2023-05:25:29] [I] Profiling verbosity: 0
[08/23/2023-05:25:29] [I] Tactic sources: Using default tactic sources
[08/23/2023-05:25:29] [I] timingCacheMode: local
[08/23/2023-05:25:29] [I] timingCacheFile:
[08/23/2023-05:25:29] [I] Heuristic: Disabled
[08/23/2023-05:25:29] [I] Preview Features: Use default preview flags.
[08/23/2023-05:25:29] [I] Input(s)s format: fp32:CHW
[08/23/2023-05:25:29] [I] Output(s)s format: fp32:CHW
[08/23/2023-05:25:29] [I] Input build shape: input=1x3x480x640+1x3x480x640+1x3x480x640
[08/23/2023-05:25:29] [I] Input calibration shapes: model
[08/23/2023-05:25:29] [I] === System Options ===
[08/23/2023-05:25:29] [I] Device: 0
[08/23/2023-05:25:29] [I] DLACore:
[08/23/2023-05:25:29] [I] Plugins:
[08/23/2023-05:25:29] [I] === Inference Options ===
[08/23/2023-05:25:29] [I] Batch: Explicit
[08/23/2023-05:25:29] [I] Input inference shape: input=1x3x480x640
[08/23/2023-05:25:29] [I] Iterations: 10
[08/23/2023-05:25:29] [I] Duration: 3s (+ 200ms warm up)
[08/23/2023-05:25:29] [I] Sleep time: 0ms
[08/23/2023-05:25:29] [I] Idle time: 0ms
[08/23/2023-05:25:29] [I] Streams: 1
[08/23/2023-05:25:29] [I] ExposeDMA: Disabled
[08/23/2023-05:25:29] [I] Data transfers: Enabled
[08/23/2023-05:25:29] [I] Spin-wait: Disabled
[08/23/2023-05:25:29] [I] Multithreading: Disabled
[08/23/2023-05:25:29] [I] CUDA Graph: Disabled
[08/23/2023-05:25:29] [I] Separate profiling: Disabled
[08/23/2023-05:25:29] [I] Time Deserialize: Disabled
[08/23/2023-05:25:29] [I] Time Refit: Disabled
[08/23/2023-05:25:29] [I] NVTX verbosity: 0
[08/23/2023-05:25:29] [I] Persistent Cache Ratio: 0
[08/23/2023-05:25:29] [I] Inputs:
[08/23/2023-05:25:29] [I] === Reporting Options ===
[08/23/2023-05:25:29] [I] Verbose: Disabled
[08/23/2023-05:25:29] [I] Averages: 10 inferences
[08/23/2023-05:25:29] [I] Percentiles: 90,95,99
[08/23/2023-05:25:29] [I] Dump refittable layers:Disabled
[08/23/2023-05:25:29] [I] Dump output: Disabled
[08/23/2023-05:25:29] [I] Profile: Disabled
[08/23/2023-05:25:29] [I] Export timing to JSON file:
[08/23/2023-05:25:29] [I] Export output to JSON file:
[08/23/2023-05:25:29] [I] Export profile to JSON file:
[08/23/2023-05:25:29] [I]
[08/23/2023-05:25:29] [I] === Device Information ===
[08/23/2023-05:25:29] [I] Selected Device: Quadro RTX 4000
[08/23/2023-05:25:29] [I] Compute Capability: 7.5
[08/23/2023-05:25:29] [I] SMs: 36
[08/23/2023-05:25:29] [I] Compute Clock Rate: 1.545 GHz
[08/23/2023-05:25:29] [I] Device Global Memory: 7966 MiB
[08/23/2023-05:25:29] [I] Shared Memory per SM: 64 KiB
[08/23/2023-05:25:29] [I] Memory Bus Width: 256 bits (ECC disabled)
[08/23/2023-05:25:29] [I] Memory Clock Rate: 6.501 GHz
[08/23/2023-05:25:29] [I]
[08/23/2023-05:25:29] [I] TensorRT version: 8.5.3
[08/23/2023-05:25:29] [I] [TRT] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 30, GPU 429 (MiB)
[08/23/2023-05:25:31] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +342, GPU +76, now: CPU 426, GPU 495 (MiB)
[08/23/2023-05:25:31] [I] Start parsing network model
[08/23/2023-05:25:31] [I] [TRT] ----------------------------------------------------------------
[08/23/2023-05:25:31] [I] [TRT] Input filename: /workspace/tao-experiments/SegFormer_TAO/SegFormer_newest_480x640.onnx
[08/23/2023-05:25:31] [I] [TRT] ONNX IR version: 0.0.6
[08/23/2023-05:25:31] [I] [TRT] Opset version: 11
[08/23/2023-05:25:31] [I] [TRT] Producer name: pytorch
[08/23/2023-05:25:31] [I] [TRT] Producer version: 1.14.0
[08/23/2023-05:25:31] [I] [TRT] Domain:
[08/23/2023-05:25:31] [I] [TRT] Model version: 0
[08/23/2023-05:25:31] [I] [TRT] Doc string:
[08/23/2023-05:25:31] [I] [TRT] ----------------------------------------------------------------
[08/23/2023-05:25:31] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/23/2023-05:25:34] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[08/23/2023-05:25:34] [I] Finish parsing network model
[08/23/2023-05:25:34] [E] Error[4]: [network.cpp::validate::3100] Error Code 4: Internal Error (input: for dimension number 2 in profile 0 does not match network definition (got min=480, opt=480, max=480), expected min=opt=max=1024).)
[08/23/2023-05:25:34] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[08/23/2023-05:25:34] [E] Engine could not be created from network
[08/23/2023-05:25:34] [E] Building engine failed
[08/23/2023-05:25:34] [E] Failed to create engine from model or file.
[08/23/2023-05:25:34] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8503] # /usr/src/tensorrt/bin/trtexec --onnx=/workspace/tao-experiments/SegFormer_TAO/SegFormer_newest_480x640.onnx --saveEngine=SegFormer_*engine.trt --minShapes=input:1x3x480x640 --optShapes=input:1x3x480x640 --maxShapes=input:1x3x480x640 --fp16 --workspace=2048

Morganh · August 23, 2023, 6:00am

When you run export, please refer to the example in https://github.com/NVIDIA/tao_tutorials/tree/main/notebooks/tao_launcher_starter_kit/segformer/specs.
Especially below, I change it for your case, please modify your export.yaml and retry.

export:
input_height: 480
input_width: 640
input_channel: 3

Otherwise, it will use default 3x1024x1024 via https://github.com/NVIDIA/tao_pytorch_backend/blob/e5010af08121404dfb696152248467eee85ab3a7/nvidia_tao_pytorch/cv/segformer/config/default_config.py#L321-L323

atcnthn · August 23, 2023, 6:22am

Solved, thank you!

system · September 6, 2023, 6:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trtexec convert onnx to engine fails TAO Toolkit	14	1206	October 30, 2023
Error while working with trtexec to create an engine with onnx file TensorRT	6	1653	July 14, 2022
LPRNet can't use exported engine file TAO Toolkit	18	2510	December 28, 2021
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	393	February 16, 2024
Trying to convert Yolov8.onnx into trt ( TensorRT version : 8.2, jetson-jetpack : 4.6.1) Jetson Xavier NX tensorrt , cuda , yolo	12	3402	May 17, 2023
Unet_isbi notebook fails to export TensorRT model with tao converter TAO Toolkit	3	526	February 8, 2022
ONNX to TensorRT conversion error: Error4 from graphShapeAnalyzer.cpp, (ITopKLayer /TopK: /TopK: K exceeds the maximum value allowed (3840).) TensorRT tensorrt , onnx , jetson-nano	2	452	May 21, 2024
GazeNet - Tao_converter [ERROR] input_left_images:0: number of dimensions is 4 but profile 0 has 3 TAO Toolkit	5	335	July 12, 2023
Can someone please guide me in resolving the issue ./trtexec --onnx=model.onnx General Discussion tensorrt , cuda , ubuntu , jetson-inference , python	0	679	June 30, 2022
Error Code 10: Internal Error (Could not find any implementation for node TensorRT cudnn	19	2580	September 29, 2024

Exporting model to onnx using "tao model segformer export"

Related topics