ONNX Model Int64 Weights

Description

I am using ONNX Runtime built with TensorRT backend to run inference on an ONNX model. When running the model, I got the following warning: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

The cast down then occurs but the problem is that this is taking a significant amount of time. I also notice that the first inference takes a significant amount of time as well.

  • It takes ~35X longer to load with network with TRT compared to not using it
  • It takes ~40X longer to run the first inference with TRT compared to not using it
  • From thereon, inference is faster by ~20-25% with TRT compared to not using it

I believe the first the top two bullet points are related and have to do with int32 castdown.

Is there something I can do to mitigate this?

Environment

TensorRT Version: 7
CUDA Version: 10.2
CUDNN Version: 7.6
Operating System + Version: Windows 10 64-bit

1 Like

Can you share the model and sample script to reproduce the issue so we can help better?
Meanwhile, can you try “trtexec” command line tool:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks

@SunilJB

I went through the process of serializing the model and saving the model to avoid this overhead and then performed inference in C++.

However, upon further inspection of the inference outpus, it seems as though the output I get using TensorRT is quite different than the original output I had in TensorFlow, and even the output I got through ONNXRuntime. I have a regression model where it seems that the TensorRT output distribution has been “squeezed in” compared to the TensorFlow output.

I seem to get the same wrong output whether I use TensorRT C++ API, or whether I use OnnxRuntime built with TensorRT.

Could this be due to the casting from int64 to int32, or could this be due to another issue? Are there other build options I can try to see if I can fix this?

Without looking into the model and code, it’s difficult to pin point the reason which might be causing the output mismatch.
Can you try to run ONNX model and compare the output values?

You can also try TF-TRT to build the optimized model, please refer below link for more details:

Thanks

@SunilJB
I also see the exact same warning message while inferring ONNX model ( exported from PyTorch based Classification model of HRNet architecture type, Opset_version 11) . Also TRT inference classification output is not matching with output of source PyTorch model inference .

[08/28/2020-15:10:27] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

What is probable impact of this warning message? Is this issue addressed already?

Environment

TensorRT Version : TensorRT-7.1.3.4
CUDA Version : CUDA 11.0
CUDNN Version : cudnn-v8.0.2.39
Operating System + Version : Windows 10 64-bit

1 Like

Tensorrt 7 downcast INT64 to INT32 automatically. But if you have limited memory usage with
config->setMaxWorkspaceSize(1 << X) which is less than what is actually required by model to load, it will take too much time to load, even the script might get killed.
You can just disable setMaxWorkspaceSize, if you have used.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

above check model does not return any thing. tested onxx model and it is working fine. attached log file with verbose log enabled as suggested. Getting below output

/usr/src/tensorrt/bin/trtexec --verbose --onnx=yolox_x.onnx --saveEngine=yolox.trt --explicitBatch

[W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
terminate called after throwing an instance of ‘pwgen::PwgenException’
what(): Driver error:
Aborted (core dumped)

.[yolox-onxx-tensorRT.log|attachment]
(upload://tJWlYYJwSqzblgIBPqxr3EF1bnw.log) (488.5 KB)

Any update regarding this? because I’m having the same issue with torch-trt. Thanks!

No

I have also observed a similar performance degradation by using tensorRT execution provider. But this error does not happen while CUDA execution provider. But overall the GPU performance while running ONNX runtime seems slower compared to CPU on a Jetson Xavier. Any insights on this ? Why is it so ? Is there a solution to scale up the performance ?

By the way I used trtexec also to convert onnx to engine model and run. But the execution fails on Jetson Xavier.

I have a 3060 pc and i am encountering the same error and same thing , where in the CPUExecution provider is faster than CUDA and even Tensort for using the rembg library ,
initally installed rembg gpu thinking it would be faster , but it is indeed slower , any fixes for this , please help