Model does not get Int8 layers

user98065 · August 29, 2022, 8:13am

Description

My network does not get Int8 layers.

Environment

TensorRT Version: 8.4

GPU Type: RTX3090
Nvidia Driver Version: 510
CUDA Version: 11.6
CUDNN Version:
Operating System + Version: Ububtu 20.04
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please execute TensoRT using the following line on the ONNX in the attached ZIP (ToyModel.zip)
ToyModel.zip (248.7 KB)

&&&& RUNNING TensorRT.trtexec [TensorRT v8400] # /usr/src/tensorrt/bin/trtexec --onnx=model_op_11.onnx --best --saveEngine=model_op_11.engine --optShapes=Input:1x5x1080x1920x1 --iterations=100 --warmUp=1000 --workspace=6000 --verbose=True

The TensorRT log shows my layers get assigned “Half” precision and not “Int8” - I would like to know why.

Most operations in the model are squeeze / unsqueeze / concat, etc’ but it also has Conv2D / Relu. Still those operations must perform faster when Input / Output is quantized to 8 bits.

NVES · August 29, 2022, 12:08pm

Hi, Please refer to the below links to perform inference in INT8

Thanks!

user98065 · August 30, 2022, 7:28am

Thanks @NVES ,

Those resources don’t seem relevant to my issue.

I have an model, ONNX was provided this thread.
I expect that when I ask trtexec to generate an engine file with “–best” settings, I will get some Int8 layers. But for my model, trtexec produces layers with Half precision.

My question is why doesn’t my model get Int8 layers? Could you please check my model?
I couldn’t understand from the log file why it rejects setting Int8 layers.

Even if I provide a calibration file for the model, it still does not generate Int8 layers.

spolisetty · September 19, 2022, 4:34pm

Hi,

Sorry for the delayed response.
When we give –-best, TensorRT will choose the better one among 3 precisions based on the runtime, network.

Also, The network consists of many convs with tiny input channels (C) and output channels (K). Something like 1 or 5.
INT8 TensorCores require us to pad C and K to 32, so essentially it will waste a lot of computations and does not have benefits compared to FP16.

Also, we are not sure why the model requires so many Split ops and tiny Conv ops. If we want to use grouped conv or depthwise conv, we should use a single Conv op with the group attribute set to the number of groups, instead of splitting each group into standalone Conv ops.

Hope this helps.
Thank you.

system · October 3, 2022, 4:35pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.