Model does not get Int8 layers

Description

My network does not get Int8 layers.

Environment

TensorRT Version: 8.4

GPU Type: RTX3090
Nvidia Driver Version: 510
CUDA Version: 11.6
CUDNN Version:
Operating System + Version: Ububtu 20.04
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please execute TensoRT using the following line on the ONNX in the attached ZIP (ToyModel.zip)
ToyModel.zip (248.7 KB)

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # /usr/src/tensorrt/bin/trtexec --onnx=model_op_11.onnx --best --saveEngine=model_op_11.engine --optShapes=Input:1x5x1080x1920x1 --iterations=100 --warmUp=1000 --workspace=6000 --verbose=True

The TensorRT log shows my layers get assigned “Half” precision and not “Int8” - I would like to know why.

Most operations in the model are squeeze / unsqueeze / concat, etc’ but it also has Conv2D / Relu. Still those operations must perform faster when Input / Output is quantized to 8 bits.

Hi, Please refer to the below links to perform inference in INT8

Thanks!

Thanks @NVES ,

Those resources don’t seem relevant to my issue.

I have an model, ONNX was provided this thread.
I expect that when I ask trtexec to generate an engine file with “–best” settings, I will get some Int8 layers. But for my model, trtexec produces layers with Half precision.

My question is why doesn’t my model get Int8 layers? Could you please check my model?
I couldn’t understand from the log file why it rejects setting Int8 layers.

Even if I provide a calibration file for the model, it still does not generate Int8 layers.

Hi,

Sorry for the delayed response.
When we give –-best, TensorRT will choose the better one among 3 precisions based on the runtime, network.

Also, The network consists of many convs with tiny input channels (C) and output channels (K). Something like 1 or 5.
INT8 TensorCores require us to pad C and K to 32, so essentially it will waste a lot of computations and does not have benefits compared to FP16.

Also, we are not sure why the model requires so many Split ops and tiny Conv ops. If we want to use grouped conv or depthwise conv, we should use a single Conv op with the group attribute set to the number of groups, instead of splitting each group into standalone Conv ops.

Hope this helps.
Thank you.