Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized?

Description

I am trying to convert RAFT model (GitHub - princeton-vl/RAFT) from Pytorch (1.9) to TensorRT (7) with INT8 quantization through ONNX (opset 11).
I am using the “base” (not “small”) version of RAFT with the ordinary (not “alternate”) correlation block and 10 iterations.
The model is slightly modified to remove the quantization problems (Shape layers for example). But the modifications are not changing the model structure, so the weights from the original model may be loaded for the modified one. Also, GridSamplePlugin (GitHub - TrojanXu/onnxparser-trt-plugin-sample: A sample for onnxparser working with trt user defined plugins for TRT7.0) is used to deal with Pytorch grid_sample operation.
I am using the post-training INT8 quantization to create TRT INT8 model.
The problem is that some layers are not quantized to INT8. And these are not some specific layers like GridSample, which has not INT8 implementation. These are general layers like Conv, Reshape, Transpose, Mul, Add, etc.
The conversion log demonstrates a number of warnings for these layers similar to this one:
Rejecting some int8 implementation of layer Conv_3 + Relu_4 due to missing int8 scales for one or more layer i/o tensors.
Additional problem happened when I tried to change SepConvGRU in the update block to ConvGRU (which is used in “small” version of RAFT). This change is not affecting anything outside the update block, and the time profiling of the Torch network confirms it.
But after converting such ConvGRU RAFT to TRT I found that some layers in the backbone, which were quantized successfully in SepConvGRU RAFT, now are not quantized with the same warning.

So the questions are:

  1. What in the quantization process leads to these warnings for some layers?
  2. How changing of some layers in the middle part of the network may affect the quantization of the backbone layers?

Environment

TensorRT Version: 7
GPU Type: NVidia GeForce GTX 1050 Ti
Nvidia Driver Version: 510.73.05
CUDA Version: 11.1.74
CUDNN Version: 8.1.1
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag):

Steps To Reproduce

  1. Build TensorRT OSS with the custom plugin according to GitHub - TrojanXu/onnxparser-trt-plugin-sample: A sample for onnxparser working with trt user defined plugins for TRT7.0 (as an alternative a custom conversion code that includes this plugin may be used).
  2. Create ONNX model according to the code example at GitHub - TrojanXu/onnxparser-trt-plugin-sample: A sample for onnxparser working with trt user defined plugins for TRT7.0.
  3. Create TensorRT engine with or with INT8 quantization and look through the conversion log to find these warnings.

Conversion log example: conversion_log_raft.txt

Hi,
Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.

Thanks!

@NVES Thank you for the reply.
Unfortunately, your reply is related to the custom plugins and custom ONNX model modifications. We had even more complicated problem with grid_sample layer and solved it due to the reference I mentioned in the reproducing section.
Our problem is related to INT8 quantization, not just Torch-ONNX-TRT conversion. And it is related to the quantization of ordinary layers (Conv, Add, Mul, Tanh, etc.). For custom GridSamplePlugin there is no such problem, as it does not contain INT8 implementation and thus cannot be quantized.
So my questions were:

  • Why for some of these ordinary layers we have the “Rejecting some int8 implementation” problem?
  • Why in the case of changing model structure the quantization of layers not affected by the change is changing (“Rejecting some int8 implementation” problem appears or disappears)?

Hi,

We will get back to you on the above queries,
Could you please share with us the ONNX model for better debugging.

Thank you.

Here is the ONNX model.
RAFT_640x360_minchange.onnx (20.2 MB)
Created from the original RAFT model with changes required for providing ONNX-TRT conversion with INT8 precision. For ONNX-TRT conversion GridSamplePlugin is required.

Hi,

We couldn’t run trtexec on the model successfully. Are you using grid sample plugin?

[07/19/2022-14:44:45] [E] [TRT] ModelImporter.cpp:774: — Begin node —
[07/19/2022-14:44:45] [E] [TRT] ModelImporter.cpp:775: input: “311”
input: “491”
output: “492”
name: “GridSampler_185”
op_type: “TRT_PluginV2”
attribute {
name: “name”
s: “GridSampler”
type: STRING
}
attribute {
name: “version”
s: “1”
type: STRING
}
attribute {
name: “namespace”
s: “”
type: STRING
}
attribute {
name: “data”
s: “\004\000\000\000\000\000\000\000-\000\000\000\000\000\000\000P\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\001\001\000\000\000”
type: STRING
}

[07/19/2022-14:44:45] [E] [TRT] ModelImporter.cpp:776: — End node —
[07/19/2022-14:44:45] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:5211 In function importTRT_PluginV2:
[6] Assertion failed: creator && “Plugin not found, are the plugin name, version, and namespace correct?”

Regarding your original query,
Tensors that are missing scale information can’t be quantized and that is the meaning of the error log ("Rejecting some int8 implementation of layer Conv_3 + Relu_4 due to missing int8 scales for one or more layer i/o tensors.").

Could you please let us know the calibration process you’re following and please make sure you’re using a calibration algo that runs before the fusions occur (this way every tensor should have a scale attached to it).

Thank you.

Also, looks like you’re using an old version of the TensoRT.
We recommend you to please try on the latest TensorRT version 8.4 GA.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Yes, as I said in the initial problem description, GridSamplePlugin from https://github.com/TrojanXu/onnxparser-trt-plugin-sample is used to deal with grid_sample Pytorch operation. I followed author’s instructions to create ONNX file and convert it to TRT.
The conversion may be done both with trtexec or the custom C++ code. For INT8 calibration I used the custom calibrator written according to the instructions at https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reduced-precision
The problem is that by using the same code to convert the slightly different models I may get no such warnings, some warnings or warnings at almost all model layers. And the layers where warnings appear are not related to the changes in the model.
Using of old TRT version caused by the plans of porting the model to Jetson Xavier.

We request you to please confirm if are you facing this on TensorRT version 8.4 GA as well?

Thank you.

I tried TensorRT 8.4.2.4. All the other environment remained the same (510.73.05 driver, cuda 11.1.74, cudnn 8.1.1). The problem remained, but they may occur for different layers (for example the model that was converted with no " Rejecting some int8 implementation of layer" warnings now may be converted with them and aback).
I event tried to simplify the model for test purposes and created a test model which includes only RAFT BasicUpdateBlock in the iteration loop (such model is easily converted to INT8 with no specific modifications; according to the update block structure it has 4 inputs and 2 outputs). When I run the conversion for 1 iteration, the model is created with no problem, but if I set 2 iterations, such warnings appear, and not only for “new” layers.

Could you please share with us the minimal issue repro mode/steps and output logs for better debugging.