Inswapper onnx model conversion to tensorrt model

I am trying to run the insightface inswapper model in Jetson AGX Orin Developer Kit with Jetpack 5.1.2 having TensorRT v8.5. For this, CUDAExecutionProvider is used with onnx model but the inference time is too low with around 5 FPS.
I tried to convert the onnx model to tensorrt model using different approaches like trtexec, polygraphy but during conversion it shows following error.
ONNX model generated with INT64 while TensorRT does not support INT64. Attempting to cast to INT32. Segmentation fault

However converting the onnx model to trt model using NVIDIA GeForce RTX 3090 with TensorRT v10.5 is working fine. But the converted trt model cannot be loaded in Jetson device.

I want to run the TensorRT model inference in Jetson AGX Orin Developer Kit with Jetpack 5.1.2. Is there any other possible way to convert inswapper onnx model to trt model with TensorRT v8.5? or is it possible to run the trt model converted from TensorRT v10.5 in TensorRT v8.5?

Thanks in advance. Any help would be appreciated.

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Model: insightface inswapper https://github.com/facefusion/facefusion-assets/releases/download/models/inswapper_128.onnx
Conversion step: trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt.engine

Error:

&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt.engine
[10/22/2024-15:16:37] [I] === Model Options ===
[10/22/2024-15:16:37] [I] Format: ONNX
[10/22/2024-15:16:37] [I] Model: inswapper_128.onnx
[10/22/2024-15:16:37] [I] Output:
[10/22/2024-15:16:37] [I] === Build Options ===
[10/22/2024-15:16:37] [I] Max batch: explicit batch
[10/22/2024-15:16:37] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[10/22/2024-15:16:37] [I] minTiming: 1
[10/22/2024-15:16:37] [I] avgTiming: 8
[10/22/2024-15:16:37] [I] Precision: FP32
[10/22/2024-15:16:37] [I] LayerPrecisions: 
[10/22/2024-15:16:37] [I] Calibration: 
[10/22/2024-15:16:37] [I] Refit: Disabled
[10/22/2024-15:16:37] [I] Sparsity: Disabled
[10/22/2024-15:16:37] [I] Safe mode: Disabled
[10/22/2024-15:16:37] [I] DirectIO mode: Disabled
[10/22/2024-15:16:37] [I] Restricted mode: Disabled
[10/22/2024-15:16:37] [I] Build only: Disabled
[10/22/2024-15:16:37] [I] Save engine: inswapper_trt.engine
[10/22/2024-15:16:37] [I] Load engine: 
[10/22/2024-15:16:37] [I] Profiling verbosity: 0
[10/22/2024-15:16:37] [I] Tactic sources: Using default tactic sources
[10/22/2024-15:16:37] [I] timingCacheMode: local
[10/22/2024-15:16:37] [I] timingCacheFile: 
[10/22/2024-15:16:37] [I] Heuristic: Disabled
[10/22/2024-15:16:37] [I] Preview Features: Use default preview flags.
[10/22/2024-15:16:37] [I] Input(s)s format: fp32:CHW
[10/22/2024-15:16:37] [I] Output(s)s format: fp32:CHW
[10/22/2024-15:16:37] [I] Input build shapes: model
[10/22/2024-15:16:37] [I] Input calibration shapes: model
[10/22/2024-15:16:37] [I] === System Options ===
[10/22/2024-15:16:37] [I] Device: 0
[10/22/2024-15:16:37] [I] DLACore: 
[10/22/2024-15:16:37] [I] Plugins:
[10/22/2024-15:16:37] [I] === Inference Options ===
[10/22/2024-15:16:37] [I] Batch: Explicit
[10/22/2024-15:16:37] [I] Input inference shapes: model
[10/22/2024-15:16:37] [I] Iterations: 10
[10/22/2024-15:16:37] [I] Duration: 3s (+ 200ms warm up)
[10/22/2024-15:16:37] [I] Sleep time: 0ms
[10/22/2024-15:16:37] [I] Idle time: 0ms
[10/22/2024-15:16:37] [I] Streams: 1
[10/22/2024-15:16:37] [I] ExposeDMA: Disabled
[10/22/2024-15:16:37] [I] Data transfers: Enabled
[10/22/2024-15:16:37] [I] Spin-wait: Disabled
[10/22/2024-15:16:37] [I] Multithreading: Disabled
[10/22/2024-15:16:37] [I] CUDA Graph: Disabled
[10/22/2024-15:16:37] [I] Separate profiling: Disabled
[10/22/2024-15:16:37] [I] Time Deserialize: Disabled
[10/22/2024-15:16:37] [I] Time Refit: Disabled
[10/22/2024-15:16:37] [I] NVTX verbosity: 0
[10/22/2024-15:16:37] [I] Persistent Cache Ratio: 0
[10/22/2024-15:16:37] [I] Inputs:
[10/22/2024-15:16:37] [I] === Reporting Options ===
[10/22/2024-15:16:37] [I] Verbose: Disabled
[10/22/2024-15:16:37] [I] Averages: 10 inferences
[10/22/2024-15:16:37] [I] Percentiles: 90,95,99
[10/22/2024-15:16:37] [I] Dump refittable layers:Disabled
[10/22/2024-15:16:37] [I] Dump output: Disabled
[10/22/2024-15:16:37] [I] Profile: Disabled
[10/22/2024-15:16:37] [I] Export timing to JSON file: 
[10/22/2024-15:16:37] [I] Export output to JSON file: 
[10/22/2024-15:16:37] [I] Export profile to JSON file: 
[10/22/2024-15:16:37] [I] 
[10/22/2024-15:16:37] [I] === Device Information ===
[10/22/2024-15:16:37] [I] Selected Device: Orin
[10/22/2024-15:16:37] [I] Compute Capability: 8.7
[10/22/2024-15:16:37] [I] SMs: 8
[10/22/2024-15:16:37] [I] Compute Clock Rate: 1.3 GHz
[10/22/2024-15:16:37] [I] Device Global Memory: 62800 MiB
[10/22/2024-15:16:37] [I] Shared Memory per SM: 164 KiB
[10/22/2024-15:16:37] [I] Memory Bus Width: 256 bits (ECC disabled)
[10/22/2024-15:16:37] [I] Memory Clock Rate: 0.612 GHz
[10/22/2024-15:16:37] [I] 
[10/22/2024-15:16:37] [I] TensorRT version: 8.5.2
[10/22/2024-15:16:38] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 8903 (MiB)
[10/22/2024-15:16:40] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +285, now: CPU 574, GPU 9210 (MiB)
[10/22/2024-15:16:40] [I] Start parsing network model
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 554253681
[10/22/2024-15:16:40] [I] [TRT] ----------------------------------------------------------------
[10/22/2024-15:16:40] [I] [TRT] Input filename:   inswapper_128.onnx
[10/22/2024-15:16:40] [I] [TRT] ONNX IR version:  0.0.6
[10/22/2024-15:16:40] [I] [TRT] Opset version:    11
[10/22/2024-15:16:40] [I] [TRT] Producer name:    pytorch
[10/22/2024-15:16:40] [I] [TRT] Producer version: 1.12.1
[10/22/2024-15:16:40] [I] [TRT] Domain:           
[10/22/2024-15:16:40] [I] [TRT] Model version:    0
[10/22/2024-15:16:40] [I] [TRT] Doc string:       
[10/22/2024-15:16:40] [I] [TRT] ----------------------------------------------------------------
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 554253681
[10/22/2024-15:16:41] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/22/2024-15:16:41] [I] Finish parsing network model
Segmentation fault (core dumped)

Hi,

If the model works well on TensorRT 10, would you mind upgrading to JetPack 6.1?

JetPack 6.1 integrates TensorRT 10.3.

Thanks.

Thanks for the suggestion. I’ll check with TensorRT 10.3. If it works, I’ll upgrade to JetPack 6.1

Is this still an issue to support? Any result can be shared?

Thank you for your reply. I checked the conversion on TensorRT v10.3 in NVDIA GeForce RTX 3090 and it is working fine. I will be upgrading to JetPack 6.1. I’ll let you know if it works or not.

Thanks. The conversion is now successful in Jetpack 6.1 with TensorRT v10.3.
But now after upgrading to Jetpack 6.1 where should I install onnxruntime and PyTorch library from. Previously, the libraries were installed from following link Jetson Zoo - eLinux.org . But there is no option for Jetpack 6.1.

Hi,

Please find the JetPack 6.1 package below :

http://jetson.webredirect.org/jp6/cu126

Thanks.

1 Like

Thank you

The inference is working great for the converted fp32 TensorRT model. But when converted to fp16 model with polygraphy as shown below, the accuracy decreases drastically.
trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt_fp16.engine --fp16

Is there any other method for TensorRT quantization to fp16 such that the accuracy is not affected much?

Hi,

Could you share how much the accuracy dropped?

It’s expected the inference will lose certain accuracy as the precision drops.
But in general, it won’t impact the output of the real use case.

Thanks.

Actually, the output is not proper. following is the fp16 layer inspection result
inswapper__fp16_engine.txt (112.9 KB)

Result obtained with fp32 tensorRT model.


Result obtained with fp16 tensorRT model.

Hi,

We would like to reproduce this issue locally to get more information.
Could you share a completely reproducible source with us?

Thanks.

After the conversion of onnx model to tensorRT fp16 using
trtexec --onnx=inswapper_128.onnx --saveEngine=inswapper_trt_fp16.engine --fp16
and also
`polygraphy convert inswapper_128.fp16.onnx -o inswapper_128_fp16.engine --fp16’
The inference is run on tensorRT fp16 version.
Here is the inference for tensorRT with main file named inswapper_engine_infer.py
nvidia_test.zip (2.3 MB)

Hi,

We tried to reproduce this issue but met some dependencies issues with onnxruntime.
Do you build the onnxruntime for Jetpack 6.1 from the source?

Thanks.

onnxruntime-gpu whl file was installed from the previously provided link jp6/cu126 index

Hi,

Sorry that I just missed this.
Will give it a try and provide more info to you later.

Thanks.

1 Like

if onnxruntime was also installed on top of onnxruntime-gpu, it could also show dependencies conflict. And also specifice numpy version 1.26.4 is required

Hi,

Thanks for the hint.

But we meet another error about a missing file.

$ python3 inswapper_engine_infer.py 
Traceback (most recent call last):
  File "/home/nvidia/topic_310715/nvidia_test/inswapper_engine_infer.py", line 43, in <module>
    models = load_models()
  File "/home/nvidia/topic_310715/nvidia_test/inswapper_engine_infer.py", line 28, in load_models
    detector = insightface.model_zoo.get_model('model/det_500m.onnx', providers=['CUDAExecutionProvider'])
  File "/home/nvidia/.local/lib/python3.10/site-packages/insightface/model_zoo/model_zoo.py", line 91, in get_model
    assert osp.exists(model_file), 'model_file %s should exist'%model_file
AssertionError: model_file model/det_500m.onnx should exist

Could you share where we can find the file?
Thanks.