ONNX Model Int64 Weights

Description

I am using ONNX Runtime built with TensorRT backend to run inference on an ONNX model. When running the model, I got the following warning: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

The cast down then occurs but the problem is that this is taking a significant amount of time. I also notice that the first inference takes a significant amount of time as well.

  • It takes ~35X longer to load with network with TRT compared to not using it
  • It takes ~40X longer to run the first inference with TRT compared to not using it
  • From thereon, inference is faster by ~20-25% with TRT compared to not using it

I believe the first the top two bullet points are related and have to do with int32 castdown.

Is there something I can do to mitigate this?

Environment

TensorRT Version: 7
CUDA Version: 10.2
CUDNN Version: 7.6
Operating System + Version: Windows 10 64-bit

Can you share the model and sample script to reproduce the issue so we can help better?
Meanwhile, can you try “trtexec” command line tool:

Thanks

@SunilJB

I went through the process of serializing the model and saving the model to avoid this overhead and then performed inference in C++.

However, upon further inspection of the inference outpus, it seems as though the output I get using TensorRT is quite different than the original output I had in TensorFlow, and even the output I got through ONNXRuntime. I have a regression model where it seems that the TensorRT output distribution has been “squeezed in” compared to the TensorFlow output.

I seem to get the same wrong output whether I use TensorRT C++ API, or whether I use OnnxRuntime built with TensorRT.

Could this be due to the casting from int64 to int32, or could this be due to another issue? Are there other build options I can try to see if I can fix this?

Without looking into the model and code, it’s difficult to pin point the reason which might be causing the output mismatch.
Can you try to run ONNX model and compare the output values?

You can also try TF-TRT to build the optimized model, please refer below link for more details:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usingtftrt

Thanks

@SunilJB
I also see the exact same warning message while inferring ONNX model ( exported from PyTorch based Classification model of HRNet architecture type, Opset_version 11) . Also TRT inference classification output is not matching with output of source PyTorch model inference .

[08/28/2020-15:10:27] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

What is probable impact of this warning message? Is this issue addressed already?

Environment

TensorRT Version : TensorRT-7.1.3.4
CUDA Version : CUDA 11.0
CUDNN Version : cudnn-v8.0.2.39
Operating System + Version : Windows 10 64-bit