How to apply int8 quantization to Transformer on Xavier

user71282 · August 11, 2022, 9:05am

Applying “trtexec” to convert onnx model to trt engine, --int8, the OPs like Einsum or MatMul fallback to fp32. But – fp16 ok. So, is there any other way to speed up transformer inference ?

AastaLLL · August 12, 2022, 2:32am

Hi,

It’s more recommended to try the mixed precision for better performance.

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/resnet50/ResNet50.onnx --best
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/resnet50/ResNet50.onnx --best
[08/12/2022-10:31:36] [I] === Model Options ===
[08/12/2022-10:31:36] [I] Format: ONNX
[08/12/2022-10:31:36] [I] Model: /usr/src/tensorrt/data/resnet50/ResNet50.onnx
[08/12/2022-10:31:36] [I] Output:
[08/12/2022-10:31:36] [I] === Build Options ===
[08/12/2022-10:31:36] [I] Max batch: explicit batch
[08/12/2022-10:31:36] [I] Workspace: 16 MiB
[08/12/2022-10:31:36] [I] minTiming: 1
[08/12/2022-10:31:36] [I] avgTiming: 8
[08/12/2022-10:31:36] [I] Precision: FP32+FP16+INT8
[08/12/2022-10:31:36] [I] Calibration: Dynamic
[08/12/2022-10:31:36] [I] Refit: Disabled
[08/12/2022-10:31:36] [I] Sparsity: Disabled
[08/12/2022-10:31:36] [I] Safe mode: Disabled
[08/12/2022-10:31:36] [I] DirectIO mode: Disabled
[08/12/2022-10:31:36] [I] Restricted mode: Disabled
[08/12/2022-10:31:36] [I] Save engine:
[08/12/2022-10:31:36] [I] Load engine:
[08/12/2022-10:31:36] [I] Profiling verbosity: 0
[08/12/2022-10:31:36] [I] Tactic sources: Using default tactic sources
[08/12/2022-10:31:36] [I] timingCacheMode: local
...

Thanks.

system · August 31, 2022, 5:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1930	November 11, 2021
Jetson AGX Xavier INT8 Performance Jetson AGX Xavier	4	1765	October 18, 2021
Pruning .onnx and convert to .engine Jetson Xavier NX tensorrt	5	1687	May 6, 2022
How to verify if QAT TRT engine is indeed INT8 on Xavier Jetson AGX Xavier tensorrt	16	593	October 5, 2022
Object Detection Inference Optimisation Jetson Xavier NX jetson-inference	4	625	April 17, 2023
TRT inference with int8 calibration and specific inputIOFormats TensorRT tensorrt	1	371	February 13, 2024
High RAM consumption with CUDA and TensorRT on Jetson Xavier NX Jetson Xavier NX tensorrt	10	2842	October 18, 2021
No clear indication on what the format of the calibration data should be for the trtexec application should be Jetson Xavier NX tensorrt	4	1058	September 25, 2023
Runtime Performance Decreased while using int8 - tflite Jetson Xavier NX tensorflow	2	1062	September 27, 2021
Same inference speed with Resnet50 for int8 and fp16 Jetson Xavier NX jetson-inference	4	685	October 18, 2021

How to apply int8 quantization to Transformer on Xavier

Related topics