Description
TensorRT processing of quantized ResNet50 ONNX graph (explicit quantization) does not perform all the layer fusions that it does in implicit quantization. In particular, the implicit quantization fuses the first convolution layer with the following maxpool layer, which does not occur with the explicitly quantized model. This gives the implicit quantization model about 15% higher throughput.
The TensorRT documentation does not mention the conditions needed for fusing convolution and maxpool layers. I experimented with multiple settings but was not able to force the fusion.
Environment
TensorRT Version: 8.2.3-1+cuda11.4
GPU Type: A100-SXM4-40GB
Nvidia Driver Version: 460.32.03
CUDA Version: 11.6
CUDNN Version: 8.3
Operating System + Version: Ubuntu 20.04.2 LTS
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): not applicable
PyTorch Version (if applicable): not applicable
Baremetal or Container (if container which image + tag): tensorrt:22.02-py3 (NGC catalog)
Relevant Files
ONNX graphs: resnet50.onnx (FP32) and resnet50_fake_ptq.onnx (explicit quantization)
Layer profiles (generated by trtexec): resnet50_profile.json (implicit quantization) and resnet50_fake_ptq.json (explicit quantization)
resnet50_fake_ptq_profile.json (10.8 KB)
resnet50_fake_ptq.onnx (97.8 MB)
resnet50_profile.json (7.6 KB)
resnet50.onnx (97.7 MB)
Steps To Reproduce
Using the docker container listed above I benchmark the performance using trtexec:
Implicit quantization
trtexec --onnx=resnet50.onnx --int8 --shapes=input:128x3x224x224
Explicit quantization
trtexec --onnx=resnet50_fake_ptq.onnx --int8 --shapes=input:128x3x224x224
I can inspect the fusion of layers by enabling layer profiling with the flags --exportPorfile and --separateProfileRun:
Implicit quantization
trtexec --onnx=resnet50.onnx --int8 --shapes=input:128x3x224x224 --exportProfile=resnet50_profile.json --separateProfileRun
Explicit quantization
trtexec --onnx=resnet50_fake_ptq.onnx --int8 --shapes=input:128x3x224x224 --exportProfile=resnet50_fake_quant_profile.json