Inswapper onnx model conversion to tensorrt model

manish1awale · October 22, 2024, 5:58am

I am trying to run the insightface inswapper model in Jetson AGX Orin Developer Kit with Jetpack 5.1.2 having TensorRT v8.5. For this, CUDAExecutionProvider is used with onnx model but the inference time is too low with around 5 FPS.
I tried to convert the onnx model to tensorrt model using different approaches like trtexec, polygraphy but during conversion it shows following error.
ONNX model generated with INT64 while TensorRT does not support INT64. Attempting to cast to INT32. Segmentation fault

However converting the onnx model to trt model using NVIDIA GeForce RTX 3090 with TensorRT v10.5 is working fine. But the converted trt model cannot be loaded in Jetson device.

I want to run the TensorRT model inference in Jetson AGX Orin Developer Kit with Jetpack 5.1.2. Is there any other possible way to convert inswapper onnx model to trt model with TensorRT v8.5? or is it possible to run the trt model converted from TensorRT v10.5 in TensorRT v8.5?

Thanks in advance. Any help would be appreciated.

carolyuu · October 22, 2024, 6:00am

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

TensorFlow: Installing TensorFlow for Jetson Platform - NVIDIA Docs
PyTorch: Installing PyTorch for Jetson Platform - NVIDIA Docs
We also have containers that have frameworks preinstalled:
Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

3. Tutorial

Startup deep learning tutorial:

Jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson
TensorRT sample: Jetson/L4T/TRT Customized Example - eLinux.org

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

manish1awale · October 22, 2024, 6:18am

Model: insightface inswapper https://github.com/facefusion/facefusion-assets/releases/download/models/inswapper_128.onnx
Conversion step: trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt.engine

Error:

&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt.engine
[10/22/2024-15:16:37] [I] === Model Options ===
[10/22/2024-15:16:37] [I] Format: ONNX
[10/22/2024-15:16:37] [I] Model: inswapper_128.onnx
[10/22/2024-15:16:37] [I] Output:
[10/22/2024-15:16:37] [I] === Build Options ===
[10/22/2024-15:16:37] [I] Max batch: explicit batch
[10/22/2024-15:16:37] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/22/2024-15:16:37] [I] minTiming: 1
[10/22/2024-15:16:37] [I] avgTiming: 8
[10/22/2024-15:16:37] [I] Precision: FP32
[10/22/2024-15:16:37] [I] LayerPrecisions: 
[10/22/2024-15:16:37] [I] Calibration: 
[10/22/2024-15:16:37] [I] Refit: Disabled
[10/22/2024-15:16:37] [I] Sparsity: Disabled
[10/22/2024-15:16:37] [I] Safe mode: Disabled
[10/22/2024-15:16:37] [I] DirectIO mode: Disabled
[10/22/2024-15:16:37] [I] Restricted mode: Disabled
[10/22/2024-15:16:37] [I] Build only: Disabled
[10/22/2024-15:16:37] [I] Save engine: inswapper_trt.engine
[10/22/2024-15:16:37] [I] Load engine: 
[10/22/2024-15:16:37] [I] Profiling verbosity: 0
[10/22/2024-15:16:37] [I] Tactic sources: Using default tactic sources
[10/22/2024-15:16:37] [I] timingCacheMode: local
[10/22/2024-15:16:37] [I] timingCacheFile: 
[10/22/2024-15:16:37] [I] Heuristic: Disabled
[10/22/2024-15:16:37] [I] Preview Features: Use default preview flags.
[10/22/2024-15:16:37] [I] Input(s)s format: fp32:CHW
[10/22/2024-15:16:37] [I] Output(s)s format: fp32:CHW
[10/22/2024-15:16:37] [I] Input build shapes: model
[10/22/2024-15:16:37] [I] Input calibration shapes: model
[10/22/2024-15:16:37] [I] === System Options ===
[10/22/2024-15:16:37] [I] Device: 0
[10/22/2024-15:16:37] [I] DLACore: 
[10/22/2024-15:16:37] [I] Plugins:
[10/22/2024-15:16:37] [I] === Inference Options ===
[10/22/2024-15:16:37] [I] Batch: Explicit
[10/22/2024-15:16:37] [I] Input inference shapes: model
[10/22/2024-15:16:37] [I] Iterations: 10
[10/22/2024-15:16:37] [I] Duration: 3s (+ 200ms warm up)
[10/22/2024-15:16:37] [I] Sleep time: 0ms
[10/22/2024-15:16:37] [I] Idle time: 0ms
[10/22/2024-15:16:37] [I] Streams: 1
[10/22/2024-15:16:37] [I] ExposeDMA: Disabled
[10/22/2024-15:16:37] [I] Data transfers: Enabled
[10/22/2024-15:16:37] [I] Spin-wait: Disabled
[10/22/2024-15:16:37] [I] Multithreading: Disabled
[10/22/2024-15:16:37] [I] CUDA Graph: Disabled
[10/22/2024-15:16:37] [I] Separate profiling: Disabled
[10/22/2024-15:16:37] [I] Time Deserialize: Disabled
[10/22/2024-15:16:37] [I] Time Refit: Disabled
[10/22/2024-15:16:37] [I] NVTX verbosity: 0
[10/22/2024-15:16:37] [I] Persistent Cache Ratio: 0
[10/22/2024-15:16:37] [I] Inputs:
[10/22/2024-15:16:37] [I] === Reporting Options ===
[10/22/2024-15:16:37] [I] Verbose: Disabled
[10/22/2024-15:16:37] [I] Averages: 10 inferences
[10/22/2024-15:16:37] [I] Percentiles: 90,95,99
[10/22/2024-15:16:37] [I] Dump refittable layers:Disabled
[10/22/2024-15:16:37] [I] Dump output: Disabled
[10/22/2024-15:16:37] [I] Profile: Disabled
[10/22/2024-15:16:37] [I] Export timing to JSON file: 
[10/22/2024-15:16:37] [I] Export output to JSON file: 
[10/22/2024-15:16:37] [I] Export profile to JSON file: 
[10/22/2024-15:16:37] [I] 
[10/22/2024-15:16:37] [I] === Device Information ===
[10/22/2024-15:16:37] [I] Selected Device: Orin
[10/22/2024-15:16:37] [I] Compute Capability: 8.7
[10/22/2024-15:16:37] [I] SMs: 8
[10/22/2024-15:16:37] [I] Compute Clock Rate: 1.3 GHz
[10/22/2024-15:16:37] [I] Device Global Memory: 62800 MiB
[10/22/2024-15:16:37] [I] Shared Memory per SM: 164 KiB
[10/22/2024-15:16:37] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/22/2024-15:16:37] [I] Memory Clock Rate: 0.612 GHz
[10/22/2024-15:16:37] [I] 
[10/22/2024-15:16:37] [I] TensorRT version: 8.5.2
[10/22/2024-15:16:38] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 8903 (MiB)
[10/22/2024-15:16:40] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +285, now: CPU 574, GPU 9210 (MiB)
[10/22/2024-15:16:40] [I] Start parsing network model
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 554253681
[10/22/2024-15:16:40] [I] [TRT] ----------------------------------------------------------------
[10/22/2024-15:16:40] [I] [TRT] Input filename:   inswapper_128.onnx
[10/22/2024-15:16:40] [I] [TRT] ONNX IR version:  0.0.6
[10/22/2024-15:16:40] [I] [TRT] Opset version:    11
[10/22/2024-15:16:40] [I] [TRT] Producer name:    pytorch
[10/22/2024-15:16:40] [I] [TRT] Producer version: 1.12.1
[10/22/2024-15:16:40] [I] [TRT] Domain:           
[10/22/2024-15:16:40] [I] [TRT] Model version:    0
[10/22/2024-15:16:40] [I] [TRT] Doc string:       
[10/22/2024-15:16:40] [I] [TRT] ----------------------------------------------------------------
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 554253681
[10/22/2024-15:16:41] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/22/2024-15:16:41] [I] Finish parsing network model
Segmentation fault (core dumped)

AastaLLL · October 22, 2024, 7:02am

Hi,

If the model works well on TensorRT 10, would you mind upgrading to JetPack 6.1?

JetPack 6.1 integrates TensorRT 10.3.

Thanks.

manish1awale · October 22, 2024, 11:02am

Thanks for the suggestion. I’ll check with TensorRT 10.3. If it works, I’ll upgrade to JetPack 6.1

kayccc · November 6, 2024, 3:06am

Is this still an issue to support? Any result can be shared?

manish1awale · November 6, 2024, 4:01am

Thank you for your reply. I checked the conversion on TensorRT v10.3 in NVDIA GeForce RTX 3090 and it is working fine. I will be upgrading to JetPack 6.1. I’ll let you know if it works or not.

manish1awale · November 7, 2024, 9:41am

Thanks. The conversion is now successful in Jetpack 6.1 with TensorRT v10.3.
But now after upgrading to Jetpack 6.1 where should I install onnxruntime and PyTorch library from. Previously, the libraries were installed from following link Jetson Zoo - eLinux.org . But there is no option for Jetpack 6.1.

AastaLLL · November 11, 2024, 8:30am

Hi,

Please find the JetPack 6.1 package below :

http://jetson.webredirect.org/jp6/cu126

Thanks.

manish1awale · November 12, 2024, 10:45am

Thank you

manish1awale · November 15, 2024, 12:19pm

The inference is working great for the converted fp32 TensorRT model. But when converted to fp16 model with polygraphy as shown below, the accuracy decreases drastically.
trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt_fp16.engine --fp16

Is there any other method for TensorRT quantization to fp16 such that the accuracy is not affected much?

AastaLLL · November 18, 2024, 6:18am

Hi,

Could you share how much the accuracy dropped?

It’s expected the inference will lose certain accuracy as the precision drops.
But in general, it won’t impact the output of the real use case.

Thanks.

manish1awale · November 18, 2024, 8:14am

Actually, the output is not proper. following is the fp16 layer inspection result
inswapper__fp16_engine.txt (112.9 KB)

Result obtained with fp32 tensorRT model.

Result obtained with fp16 tensorRT model.

AastaLLL · November 20, 2024, 7:38am

Hi,

We would like to reproduce this issue locally to get more information.
Could you share a completely reproducible source with us?

Thanks.

manish1awale · November 20, 2024, 10:25am

After the conversion of onnx model to tensorRT fp16 using
trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt_fp16.engine --fp16
and also
`polygraphy convert inswapper_128.fp16.onnx -o inswapper_128_fp16.engine --fp16’
The inference is run on tensorRT fp16 version.
Here is the inference for tensorRT with main file named inswapper_engine_infer.py
nvidia_test.zip (2.3 MB)

AastaLLL · November 21, 2024, 8:48am

Hi,

We tried to reproduce this issue but met some dependencies issues with onnxruntime.
Do you build the onnxruntime for Jetpack 6.1 from the source?

Thanks.

manish1awale · November 21, 2024, 8:53am

onnxruntime-gpu whl file was installed from the previously provided link jp6/cu126 index

AastaLLL · November 22, 2024, 3:17am

Hi,

Sorry that I just missed this.
Will give it a try and provide more info to you later.

Thanks.

manish1awale · November 26, 2024, 8:04am

if onnxruntime was also installed on top of onnxruntime-gpu, it could also show dependencies conflict. And also specifice numpy version 1.26.4 is required

AastaLLL · November 27, 2024, 7:31am

Hi,

Thanks for the hint.

But we meet another error about a missing file.

$ python3 inswapper_engine_infer.py 
Traceback (most recent call last):
  File "/home/nvidia/topic_310715/nvidia_test/inswapper_engine_infer.py", line 43, in <module>
    models = load_models()
  File "/home/nvidia/topic_310715/nvidia_test/inswapper_engine_infer.py", line 28, in load_models
    detector = insightface.model_zoo.get_model('model/det_500m.onnx', providers=['CUDAExecutionProvider'])
  File "/home/nvidia/.local/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 91, in get_model
    assert osp.exists(model_file), 'model_file %s should exist'%model_file
AssertionError: model_file model/det_500m.onnx should exist

Could you share where we can find the file?
Thanks.

Topic		Replies	Views
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	1889	June 14, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1416	July 12, 2022
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3663	April 20, 2022
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4585	April 27, 2022
Onnx to trt conversion TensorRT tensorrt	8	810	April 21, 2020
Onnx to TensorRT mismatch Jetson Orin NX tensorrt , cuda , cudnn , onnx	11	1010	January 15, 2024
tensorRT inference unstable compared onnxruntime TensorRT	4	1325	May 4, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1150	January 19, 2022
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1449	October 5, 2023
TensorRT INT8 inference is slower than FP16 in models with conditional flow Jetson Orin Nano tensorrt , cuda , jetson-inference , onnx	5	1140	June 10, 2024

Inswapper onnx model conversion to tensorrt model

1. Performance

2. Installation

3. Tutorial

4. Report issue

Related topics