Setting the input to int8 results in Engine construction errors using trtexec --onnx="whereTest.onnx" --int8 --inputIOFormats=int8:chw --verbose. Full error logs and the used onnx file have been attached.
The model is constructed from a simple tensorflow model using tf2onnx. If i don’t use tf.where, i don’t get the error.
TensorRT Version : 7.1.3 GPU Type : Nvidia Driver Version : 450.102.04 CUDA Version : 11.0.3 CUDNN Version : 8.0.4 Operating System + Version : Python Version (if applicable) : TensorFlow Version (if applicable) : PyTorch Version (if applicable) : Baremetal or Container (if container which image + tag) : nvcr.io/nvidia/tensorrt:20.09-py3
I am not looking to do int8 inference, only to pass the input data as int8. I have a larger model where i got this working, but the issue seems to be the onnx code coming from a tf.where. if i remove my tf.where in this example, and directly return the random result, it runs without errors. The network i provided is a minimal sample, i also have a larger network that includes some other operations. There i observe the same thing, if i remove the tf.where and generate a model, everything runs.
Same error occurs on TRT 7.2. Basically on the latest nvcr.io/nvidia/tensorrt:21.02-py3 container release. I think the problem is related to casting to int8 before the output. i’ve gotten a similar but different error when trying out another simple program, by casting to int8 after doing matrix multiplication. I’ve added this sample program here: whereTestInt.onnx (655 Bytes) .
In this case i get the following error, again in cudnnBuilder2.cpp
I don’t really want to do anything like quantization. All i want is to pass my input and output as compactly as possible, and in the same format that is used on the cpu application, to avoid any additional copying.
Is casting to int8 not supported? If so it would be nice to get a more helpful error message rather than an assertion failure.
Perhaps one last question, what do you consider the best workaround at this time? Is an ArgMax that directly outputs int8 supported, so that i can avoid the cast?
At this moment my workaround is to work in int32 and have the additional copy on the cpu.
For argMax the output is indices has type INT32, not sure it can be converted to int8.
Let’s use the same workaround you’re following till this issue is fixed in future release.