ONNX Model Int64 Weights

solarflarefx · May 21, 2020, 11:19pm

Description

I am using ONNX Runtime built with TensorRT backend to run inference on an ONNX model. When running the model, I got the following warning: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

The cast down then occurs but the problem is that this is taking a significant amount of time. I also notice that the first inference takes a significant amount of time as well.

It takes ~35X longer to load with network with TRT compared to not using it
It takes ~40X longer to run the first inference with TRT compared to not using it
From thereon, inference is faster by ~20-25% with TRT compared to not using it

I believe the first the top two bullet points are related and have to do with int32 castdown.

Is there something I can do to mitigate this?

Environment

TensorRT Version: 7
CUDA Version: 10.2
CUDNN Version: 7.6
Operating System + Version: Windows 10 64-bit

SunilJB · May 22, 2020, 6:00am

Can you share the model and sample script to reproduce the issue so we can help better?
Meanwhile, can you try “trtexec” command line tool:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks

solarflarefx · June 3, 2020, 2:33pm

@SunilJB

I went through the process of serializing the model and saving the model to avoid this overhead and then performed inference in C++.

However, upon further inspection of the inference outpus, it seems as though the output I get using TensorRT is quite different than the original output I had in TensorFlow, and even the output I got through ONNXRuntime. I have a regression model where it seems that the TensorRT output distribution has been “squeezed in” compared to the TensorFlow output.

I seem to get the same wrong output whether I use TensorRT C++ API, or whether I use OnnxRuntime built with TensorRT.

Could this be due to the casting from int64 to int32, or could this be due to another issue? Are there other build options I can try to see if I can fix this?

SunilJB · June 3, 2020, 8:14pm

Without looking into the model and code, it’s difficult to pin point the reason which might be causing the output mismatch.
Can you try to run ONNX model and compare the output values?

You can also try TF-TRT to build the optimized model, please refer below link for more details:

Thanks

rapidsrik4 · August 28, 2020, 9:35am

@SunilJB
I also see the exact same warning message while inferring ONNX model ( exported from PyTorch based Classification model of HRNet architecture type, Opset_version 11) . Also TRT inference classification output is not matching with output of source PyTorch model inference .

[08/28/2020-15:10:27] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

What is probable impact of this warning message? Is this issue addressed already?

Environment

TensorRT Version : TensorRT-7.1.3.4
CUDA Version : CUDA 11.0
CUDNN Version : cudnn-v8.0.2.39
Operating System + Version : Windows 10 64-bit

suyashhchougule · July 18, 2021, 3:47pm

Tensorrt 7 downcast INT64 to INT32 automatically. But if you have limited memory usage with
config->setMaxWorkspaceSize(1 << X) which is less than what is actually required by model to load, it will take too much time to load, even the script might get killed.
You can just disable setMaxWorkspaceSize, if you have used.

NVES · July 19, 2021, 6:39am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

dilip.patel · September 26, 2021, 3:38am

above check model does not return any thing. tested onxx model and it is working fine. attached log file with verbose log enabled as suggested. Getting below output

/usr/src/tensorrt/bin/trtexec --verbose --onnx=yolox_x.onnx --saveEngine=yolox.trt --explicitBatch

[W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
terminate called after throwing an instance of ‘pwgen::PwgenException’
what(): Driver error:
Aborted (core dumped)

.[yolox-onxx-tensorRT.log|attachment]
(upload://tJWlYYJwSqzblgIBPqxr3EF1bnw.log) (488.5 KB)

carinauc · March 7, 2022, 7:57pm

Any update regarding this? because I’m having the same issue with torch-trt. Thanks!

dilip.patel · March 8, 2022, 4:42am

No

roshan.gopalakrishnan · August 27, 2022, 2:22pm

I have also observed a similar performance degradation by using tensorRT execution provider. But this error does not happen while CUDA execution provider. But overall the GPU performance while running ONNX runtime seems slower compared to CPU on a Jetson Xavier. Any insights on this ? Why is it so ? Is there a solution to scale up the performance ?

roshan.gopalakrishnan · August 27, 2022, 2:24pm

By the way I used trtexec also to convert onnx to engine model and run. But the execution fails on Jetson Xavier.

sanjuhs123 · February 17, 2024, 10:16am

I have a 3060 pc and i am encountering the same error and same thing , where in the CPUExecution provider is faster than CUDA and even Tensort for using the rembg library ,
initally installed rembg gpu thinking it would be faster , but it is indeed slower , any fixes for this , please help

Topic		Replies	Views
Could not infer onnx model for TensorrtExecutionProvider provider TensorRT tensorrt , onnx	1	1151	November 11, 2022
Pytorch -> ONNX -> TensorRT inference with terrible accuracy (int64 clamped to int32) TensorRT cudnn	2	1283	January 23, 2024
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1112	January 19, 2022
Tensorrt Execution Provider TensorRT tensorrt , cudnn , onnx	1	761	November 27, 2023
ONNX to TRT Engine conversion Error TensorRT tensorrt	8	3667	May 25, 2022
Incorrect inference results after converting from ONNX to TRT with trtexec TensorRT tensorrt , python , onnx	4	1537	December 9, 2022
TensorRT Quick Start Guide Example is not running (JetPack 4.2.2) Jetson AGX Xavier tensorrt , onnx	6	895	January 5, 2022
Problem converting TensorFlow 2-> ONNX model to TensorRT Engine (efficientdet_d0) TensorRT	8	1388	November 17, 2022
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1519	September 28, 2023
Can convert to INT32 but not with FP16 TensorRT	3	1025	November 29, 2022

ONNX Model Int64 Weights

Description

Environment

Environment

check_model.py

Related topics