Using Int8 input gives assertion error in engine creation

Description

Setting the input to int8 results in Engine construction errors using trtexec --onnx="whereTest.onnx" --int8 --inputIOFormats=int8:chw --verbose. Full error logs and the used onnx file have been attached.

Screenshot from 2021-03-11 00-30-05

The model is constructed from a simple tensorflow model using tf2onnx. If i don’t use tf.where, i don’t get the error.

@tf.function
def test(self,X):
    rand = tf.random.uniform(X.shape)
    return tf.where(rand > 0.5,X,0) ``

Environment

TensorRT Version : 7.1.3
GPU Type :
Nvidia Driver Version : 450.102.04
CUDA Version : 11.0.3
CUDNN Version : 8.0.4
Operating System + Version :
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) : nvcr.io/nvidia/tensorrt:20.09-py3

Relevant Files

whereTest.onnx (1.4 KB)
log.txt (22.6 KB)

Steps To Reproduce

  1. Start the nvcr.io/nvidia/tensorrt:20.09-py3 container, mounting a folder where the attached intTest.onnx file is placed.
  2. go to the tensorrt/bin folder and run trtexec --onnx="mounted_folder/whereTest.onnx" --int8 --inputIOFormats=int8:chw --verbose

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

Hi,

I am not looking to do int8 inference, only to pass the input data as int8. I have a larger model where i got this working, but the issue seems to be the onnx code coming from a tf.where. if i remove my tf.where in this example, and directly return the random result, it runs without errors. The network i provided is a minimal sample, i also have a larger network that includes some other operations. There i observe the same thing, if i remove the tf.where and generate a model, everything runs.

Hi @kristof1,

We recommend you to try latest TRT 7.2 version. And check if you still face this issue.

Thank you.

Same error occurs on TRT 7.2. Basically on the latest nvcr.io/nvidia/tensorrt:21.02-py3 container release. I think the problem is related to casting to int8 before the output. i’ve gotten a similar but different error when trying out another simple program, by casting to int8 after doing matrix multiplication. I’ve added this sample program here: whereTestInt.onnx (655 Bytes) .

In this case i get the following error, again in cudnnBuilder2.cpp

[03/14/2021-19:50:53] [E] [TRT] ../builder/cudnnBuilder2.cpp (2025) - Assertion Error in getSupportedFormats: 0 (!formats.empty())

I don’t really want to do anything like quantization. All i want is to pass my input and output as compactly as possible, and in the same format that is used on the cpu application, to avoid any additional copying.

Is casting to int8 not supported? If so it would be nice to get a more helpful error message rather than an assertion failure.

Hi @kristof1,

This issue will be fixed in future release. Please stay tuned for the updates.

Thank you.

Great news, thanks for following this up on such short notice.

1 Like

Perhaps one last question, what do you consider the best workaround at this time? Is an ArgMax that directly outputs int8 supported, so that i can avoid the cast?

At this moment my workaround is to work in int32 and have the additional copy on the cpu.

Hi @kristof1,

For argMax the output is indices has type INT32, not sure it can be converted to int8.
Let’s use the same workaround you’re following till this issue is fixed in future release.

Thank you.