Missing quantization data when converting TF2.x QAT => ONNX => TensorRT

sriharihumbarwadi97 · September 17, 2021, 11:07am

Environment

TensorRT Version: 8.0.1.6
GPU Type: GTX 1060
Nvidia Driver Version: 461.33
CUDA Version: 11.2
CUDNN Version: 8.1
Operating System + Version: Windows 10 / Ubuntu 18.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable): 2.6/2.7-nightly

Description

I am training QAT models in tensorflow 2.6.0 and later versions. I am only turning on the QAT nodes during the final 5 epochs (fine-tuning with QAT). And as recommended by TensorRT, I have been using symmetric quantization for both inputs and weights of quantizable ops (Conv2D). Inputs are per-tensor quantized and weights are per-channel quantized as recommended. And as you can see in Figure(1) below, after converting my tensorflow trained model to ONNX (opset 13), the required Q/DQ ops are added properly. And I am also able to successfully parse and benchmark this with trtexec (PASSED TensorRT.trtexec). I have attached the full verbose log below.
But during the engine building, I see multiple such messages. For example, it says quantization data is missing for QuantLinearNode__574_quantize_scale_node but as you can see in the ONNX model visualization in Figure(1),
QuantLinearNode__574 has all the required quantization data i.e. scale=0.01611565239727497 and zero_point=0
This is pretty much happening to all QuantLinearNodes that were added to inputs of Conv2d op.

Is this expected behaviour? and are these messages safe to be ignored?
For reproduction, I have created a simple QAT trained resnet model and uploaded it in ONNX format

[09/17/2021-13:54:53] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__574_quantize_scale_node
[09/17/2021-13:54:53] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__582_quantize_scale_node
[09/17/2021-13:54:54] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__598_quantize_scale_node
[09/17/2021-13:54:54] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__614_quantize_scale_node
...
...

Figure(1)

Relevant Files

model.onnx (3.4 MB)

Steps To Reproduce

trtexec --onnx=model.onnx --useCudaGraph --threads --noDataTransfers --verbose --best >> trtexec_verbose.log
trtexec_verbose.log (1.8 MB)

NVES · September 17, 2021, 11:38am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

sriharihumbarwadi97 · September 17, 2021, 11:41am

I have done all of the above and provided

ONNX model
Verbose log of trtexec
I can successfully run the model with trtexec, but I am concerned about the messages about “missing quantization data” while running trtexec when there IS quantization data present, see Figure(1)

spolisetty · September 23, 2021, 10:02am

Hi @sriharihumbarwadi97,

We are looking into this issue, please allow us sometime to get back on this.

Thank you.

spolisetty · October 1, 2021, 4:00pm

Hi,

These logs do not indicate a definite problem. The log is produced after the graph is optimized. We do a pass over all nodes, and nodes whose output tensors don’t have quantization scales are assigned a default scale of 1 (sort of like a NoP scale). That’s what the “default quantization” means in the log.

In case of the above log tells us that there are Q nodes (574, 582) whose outputs are getting assigned the default scale (1). This is fine even though it may sound wrong. In other words, Q nodes are left in the graph after graph fusion optimization. One Q node is usually expected (to quantized the network input), but more than one Q node usually indicates that Q/DQ nodes are placed in places that prevent fusion.

One thing we’ve observed is the you model graph doesn’t follow the full recommendations from TensorRT. Like the residual branch in the model has no additional QDQ pair. For example a residual branch in the onnx model graph looks like

whereas TRT recommendation would look like

Basically the identity branch should have a QDQ pair. Otherwise, this might lead to suboptimal fusions which can leave dangling Q/DQ nodes which can be expensive as explained in the documentation : Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

We have a RN50 QAT example that you can take a look at. For example this line : TensorRT/resnet.py at master · NVIDIA/TensorRT · GitHub
TensorRT/resnet.py at master · NVIDIA/TensorRT · GitHub

In short, it may not be something to worry about. Also, more number of occurrences might be due to suboptimal Q/DQ placements.

Thank you.

sriharihumbarwadi97 · October 2, 2021, 10:27am

@spolisetty thank you for pointing out the missing Q/DQ op on the identity branch. I did add them and now my graph looks like this

trtexec log

[10/02/2021-15:41:30] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__628_quantize_scale_node
[10/02/2021-15:41:30] [V] [TRT] Setting a default quantization params because quantization data is missing for DequantLinearNode__1173_quantize_scale_node
[10/02/2021-15:41:30] [V] [TRT] Setting a default quantization params because quantization data is missing for StatefulPartitionedCall/model/global_average_pooling2d/Mean
[10/02/2021-15:41:30] [V] [TRT] Setting a default quantization params because quantization data is missing for StatefulPartitionedCall/model/dense/MatMul + StatefulPartitionedCall/model/dense/BiasAdd/ReadVariableOp__623 + (Unnamed Layer* 1115) [Shuffle] + unsqueeze_node_after_StatefulPartitionedCall/model/dense/BiasAdd/ReadVariableOp__623 + (Unnamed Layer* 1115) [Shuffle] + StatefulPartitionedCall/model/dense/BiasAdd

I managed to run the resulting ONNX model with trtexec, and this time I noticed that all the Q/DQ are fused and there are no dangling Q/DQ ops, but for these four ops. IIUC QuantLinearNode__628 and DequantLinearNode__1173 are expected since they will quantize and dequantize the input and the output respectively. And the last op is for the dense layer which I intentionally did not quantize. But I am not very sure about the global_average_pooling2d warning.
My follow-up question is, do I need to explicitly add Q/DQ ops to the final dense layer to trigger TensorRT to run global_average_pooling2d node in INT8 mode? (as shown below)

New ONNX model:
resnet.onnx (3.4 MB)

spolisetty · October 4, 2021, 3:50pm

Hi,

Could you please share us complete verbose logs.

Thank you.

sriharihumbarwadi97 · October 4, 2021, 4:12pm

I tried running trtexec in verbose mode on my host (GTX 1060) and on my AGX Xavier (JP 4.6)

On 1060

[10/04/2021-21:34:43] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__628_quantize_scale_node
[10/04/2021-21:34:43] [V] [TRT] Setting a default quantization params because quantization data is missing for DequantLinearNode__1173_quantize_scale_node
[10/04/2021-21:34:43] [V] [TRT] Setting a default quantization params because quantization data is missing for StatefulPartitionedCall/model/global_average_pooling2d/Mean
[10/04/2021-21:34:43] [V] [TRT] Setting a default quantization params because quantization data is missing for StatefulPartitionedCall/model/dense/MatMul + StatefulPartitionedCall/model/dense/BiasAdd/ReadVariableOp__623 + (Unnamed Layer* 1115) [Shuffle] + unsqueeze_node_after_StatefulPartitionedCall/model/dense/BiasAdd/ReadVariableOp__623 + (Unnamed Layer* 1115) [Shuffle] + StatefulPartitionedCall/model/dense/BiasAdd

On AGX Xavier

[10/04/2021-21:26:39] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__628_quantize_scale_node
[10/04/2021-21:26:39] [V] [TRT] Setting a default quantization params because quantization data is missing for DequantLinearNode__1173_quantize_scale_node
[10/04/2021-21:26:39] [V] [TRT] Setting a default quantization params because quantization data is missing for Reformatting CopyNode for Input Tensor 1 to StatefulPartitionedCall/model/quant_conv2d_56/transpose__614 + QuantLinearNode__1188_quantize_scale_node + StatefulPartitionedCall/model/quant_conv2d_56/BiasAdd + StatefulPartitionedCall/model/add_26/add + StatefulPartitionedCall/model/activation_54/Relu
[10/04/2021-21:26:39] [V] [TRT] Setting a default quantization params because quantization data is missing for Reformatting CopyNode for Input Tensor 0 to StatefulPartitionedCall/model/global_average_pooling2d/Mean
[10/04/2021-21:26:39] [V] [TRT] Setting a default quantization params because quantization data is missing for StatefulPartitionedCall/model/global_average_pooling2d/Mean
[10/04/2021-21:26:39] [V] [TRT] Setting a default quantization params because quantization data is missing for StatefulPartitionedCall/model/dense/MatMul + StatefulPartitionedCall/model/dense/BiasAdd

Here are the full logs

trtexec_agx_xavier.log (1.3 MB)
trtexec_gtx_1060.log (1.9 MB)

spolisetty · October 5, 2021, 10:01am

This is your onnx model’s graph around the global-average-pooling operation:

In this graph the matmul operation is performed in float. Add Q/DQ in front of the mamul ( StatefulPartitionedCall/model/dense/MatMul ) input and weights and you get int8 matmul.
However, the GAP operates in float so the output of the Conv ( StatefulPartitionedCall/model/quant_conv2d_56/BiasAdd ) is also in float and you have a dangling DQ ( DequantLinearNode__1173 ).

Add another Q/DQ pair before the GAP node. The Q node from this new pair of Q/DQ propagates backward over the ReLU and fuses with the Conv ( StatefulPartitionedCall/model/quant_conv2d_56/BiasAdd ). Now its output is int8.

It should look like this (from our reference model):

Thank you.

sriharihumbarwadi97 · October 5, 2021, 7:34pm

Thank you, this is what I got after adding the require QDQ ops for the final dense layer and the global_average_pooling layer.

For some reason, tf2onnx keeps adding a squeeze layer between the Q and DQ ops for the final dense layer, does it still result in optimal fusion? I could use graph surgery to move it before the Q op, is it required?

I can see a single Quantize layer present, this is right after the input

[10/06/2021-00:53:39] [V] [TRT] Setting a default quantization params because quantization data is missing for QuantLinearNode__634_quantize_scale_node

verbose logs from AGX Xavier
trtexec_agx_xavier_v2.log (1.2 MB)

spolisetty · October 6, 2021, 4:10pm

Have you tried moving it ?

sriharihumbarwadi97 · October 8, 2021, 4:17pm

I haven’t tried doing it yet. Also, apart from QuantLinearNode__634_quantize_scale_node, I can still see that there are few layers that have no names but still pop up in the logs here is an example.

[10/06/2021-00:53:35] [V] [TRT] Setting a default quantization params because quantization data is missing for 
[10/06/2021-00:53:35] [V] [TRT] Tactic: 1 Time: 0.005836
[10/06/2021-00:53:35] [V] [TRT] Fastest Tactic: 1 Time: 0.005836
How do I find the nodes with missing names?

By reading this I assumed that QuantLinearNode__634_quantize_scale_node is supposed to be present to quantize the network input

spolisetty · October 11, 2021, 3:17pm

Could you please share us verbose logs and onnx model

Please try and check. We think result will be optimal fusion.

Thank you.

sriharihumbarwadi97 · October 12, 2021, 3:59pm

I tried running after moving the squeeze node, But I still see unnamed layers. I have attached my logs and the final onnx model as well.

[V] [TRT] Setting a default quantization params because quantization data is missing for

agx_xavier.log (1.2 MB)
gtx_1060.log (1.8 MB)
resnet_gap_surgery.onnx (3.5 MB)

spolisetty · October 18, 2021, 4:23pm

Hi,

We looked into your onnx model, we have not identified any problem. As mentioned in previous messages, these warnings are not harmful. Also this will be fixed for Q/DQ layers to make sure ‘missing’ log doesn’t appear (since they always have “missing quantization data”).

Thank you.

Topic		Replies	Views
Trtexec cannot convert QAT onnx model to trt model TensorRT tensorrt	7	1323	January 26, 2023
TensorRT the inference is slow for the QAT model comparing to the PTQ case Jetson AGX Xavier tensorrt , nvbugs	19	1763	January 16, 2023
Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized? TensorRT tensorrt , pytorch , onnx	12	2898	December 4, 2022
TensorRT generated QAT engine, why the engine is bigger than pretrained fp16 engine? TensorRT	3	1376	January 4, 2022
Model quantized with explicit precision mode (with Q/DQ nodes) failed in engine generation TensorRT	6	1080	December 17, 2022
Post-Training Quantization (PTQ) for semantic segmentation model running on Jetson Orin NX Jetson Orin NX tensorrt	24	507	March 26, 2025
TensorRT conversion issues of ONNX model trained with Quantization Aware Training + custom quantization scale TensorRT tensorrt	5	1490	April 14, 2021
Build TRT engine with onnx QAT model throws segmentation fault TensorRT	3	1348	August 12, 2021
Trtexec cannot convert QAT onnx model to trt model Jetson AGX Xavier tensorrt	7	786	August 9, 2022
Tensorrt8.2 convert .onnx model to .trt ,throw "two inputs (data and weights) are allowed only in explicit-quantization mode"error TensorRT	9	2104	April 6, 2022

Missing quantization data when converting TF2.x QAT => ONNX => TensorRT

Environment

Description

Relevant Files

Steps To Reproduce

check_model.py

Related topics