YOLOX - Quantize int8 and convert to TensorRT engine

sencer.yucel · September 4, 2023, 11:07am

I have been trying to quantize YOLOX from float32 to int8. After that, I want that onnx output to be converted into TensorRT engine.

Quantization process seems OK, however I get several different exceptions while trying to convert it into TRT.

Below is the code that I use for quantization:

import numpy as np
from onnxruntime.quantization import quantize_static, CalibrationMethod, CalibrationDataReader, QuantType, QuantFormat

# loading the float32 ONNX model
onnx_model_input_path = "yolox_l.onnx"
onnx_model_output_path = "output.onnx"

# calibration dataset (dummy data for calibration)
class DummyDataReader(CalibrationDataReader):
    def __init__(self, num_samples):
        self.num_samples = num_samples
        self.current_sample = 0

    def get_next(self):
        if self.current_sample < self.num_samples:
            input_data = self.generate_random_input()
            self.current_sample += 1
            return {'images': input_data}
        else:
            return None

    def generate_random_input(self):
        input_data = np.random.uniform(-1, 1, size=input_shape).astype(np.float32)
        return input_data

num_calibration_samples = 100
input_shape = (1, 3, 640, 640)

calibration_data_reader = DummyDataReader(num_samples=num_calibration_samples)


# Quantize the model to int8
quantized_model = quantize_static(
    model_input=onnx_model_input_path,
    model_output=onnx_model_output_path,
    calibration_data_reader=calibration_data_reader,
    activation_type=QuantType.QInt8,
    weight_type=QuantType.QInt8,
    quant_format=QuantFormat.QDQ,
    per_channel=False,
    calibrate_method=CalibrationMethod.MinMax
)

This outputs a ~55 MB onnx file where the original YOLOX-Large model is ~450MB.

Here comes the errors now, below is the code that I use to convert onnx output model to TRT engine:

import pycuda.driver as cuda
import pycuda.autoinit

from typing import List
import tensorrt as trt
import numpy as np
import time
import cv2
import os

.
.
.
.

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

def build_engine(self,):
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
            config.max_workspace_size = 1 << self.max_workspace_size
            builder.max_batch_size = self.max_batch_size
            
            assert os.path.exists(self.onnx_file_path), print('ONNX file {} not found, please first to generate it.'.format(self.onnx_file_path))
            self.logger.info('Loading ONNX file from path {}...'.format(self.onnx_file_path))

            with open(self.onnx_file_path, 'rb') as model:
                self.logger.info('Beginning ONNX file parsing')
                if not parser.parse(model.read()):
                    self.logger.error ('ERROR: Failed to parse the ONNX file.')
                    for error in range(parser.num_errors):
                        self.logger.error(parser.get_error(error))
                    return None

            network.get_input(0).shape = self.input_shape
            plan = builder.build_serialized_network(network, config)
            engine = runtime.deserialize_cuda_engine(plan)
            with open(self.engine_file_path, "wb") as f:
                f.write(plan)
            return engine

I can successfully convert original yolox_l.onnx to TRT engine with above method. However, it returns None with the quantized model because it cannot parse the onnx and gives the error below:

[09/04/2023-10:46:19] [TRT] [E] head.cls_preds.0.bias_DequantizeLinear_dequantize_scale_node: only activation types allowed as input to this layer.
ERROR:root:ERROR: Failed to parse the ONNX file.
ERROR:root:In node 0 (parseGraph): INVALID_NODE: Invalid Node - head.cls_preds.0.bias_DequantizeLinear
head.cls_preds.0.bias_DequantizeLinear_dequantize_scale_node: only activation types allowed as input to this layer.
Traceback (most recent call last):
File "./web_server/app.py", line 45, in <module>
detect.initialize()
File "src/detector/detector.py", line 81, in initialize
self.context = self.engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

I tried to quantize the model with quantize_dynamic(), but it gives another lines of exception.

I have also changed the parameters of quantize_static(). When I set the weight and activation types to QUINT8, it gave “asymmetric quantization is not supported” error. Also, I have tried it with Quant type QOperator instead of QDQ, it then gave another error at TRT conversion phase.

Simply, I need to quantize YOLOX large model to int8 and then I need to convert the quantized model to TensorRT engine. Any help will be appreciated. Thank you in advance

AakankshaS · September 4, 2023, 3:07pm

Hi, Please refer to the below links to perform inference in INT8

Thanks!

sencer.yucel · September 4, 2023, 3:17pm

@ AakankshaS Answering problems with random useless documentations doesn’t help at all. I’ve already read them all and I know how to quantize a model from float32 to int8. The problem here is to convert the quantized onnx model to TensorRT engine.

Any help will be appreciated. Thanks.

spolisetty · September 8, 2023, 9:09am

Hi,

Could you please try building a TensorRT engine for a quantized ONNX model using the trtexec command and check if it runs successfully?
Also, please share with us the ONNX model for better debugging.

Thank you.

Topic		Replies	Views
Convert int8-onnx model to trt engine? TensorRT onnx	6	1095	April 29, 2023
Fake quantization ONNX model parse ERROR using TensorRT TensorRT	6	1784	September 26, 2021
Converting to TRT a model from Quantization Aware Training without applying calibration TensorRT	5	1728	February 2, 2021
Can convert to INT32 but not with FP16 TensorRT	3	1046	November 29, 2022
Fake quantization ONNX model parse ERROR using TensorRT 8 TensorRT	3	793	September 27, 2021
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1924	June 29, 2021
Unable to build model engine for INT8 yolov8m quantized using tensorrt model optimizer TensorRT jetson , deepstream	5	414	September 24, 2024
INT8 Calibration in Python with TensorRT 8.6 TensorRT tensorrt	5	3912	July 12, 2023
TensorRT Batch Inference: different results TensorRT	4	4237	December 1, 2021
TRT Fails to parse ONNX Model (yolov8 Segmentation) Jetson Xavier NX tensorrt , yolo , onnx	5	731	February 23, 2024

YOLOX - Quantize int8 and convert to TensorRT engine

Related topics