YOLOX - Quantize int8 and convert to TensorRT engine

I have been trying to quantize YOLOX from float32 to int8. After that, I want that onnx output to be converted into TensorRT engine.

Quantization process seems OK, however I get several different exceptions while trying to convert it into TRT.

Below is the code that I use for quantization:

import numpy as np
from onnxruntime.quantization import quantize_static, CalibrationMethod, CalibrationDataReader, QuantType, QuantFormat

# loading the float32 ONNX model
onnx_model_input_path = "yolox_l.onnx"
onnx_model_output_path = "output.onnx"

# calibration dataset (dummy data for calibration)
class DummyDataReader(CalibrationDataReader):
    def __init__(self, num_samples):
        self.num_samples = num_samples
        self.current_sample = 0

    def get_next(self):
        if self.current_sample < self.num_samples:
            input_data = self.generate_random_input()
            self.current_sample += 1
            return {'images': input_data}
            return None

    def generate_random_input(self):
        input_data = np.random.uniform(-1, 1, size=input_shape).astype(np.float32)
        return input_data

num_calibration_samples = 100
input_shape = (1, 3, 640, 640)

calibration_data_reader = DummyDataReader(num_samples=num_calibration_samples)

# Quantize the model to int8
quantized_model = quantize_static(

This outputs a ~55 MB onnx file where the original YOLOX-Large model is ~450MB.

Here comes the errors now, below is the code that I use to convert onnx output model to TRT engine:

import pycuda.driver as cuda
import pycuda.autoinit

from typing import List
import tensorrt as trt
import numpy as np
import time
import cv2
import os


TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

def build_engine(self,):
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as config, trt.OnnxParser(network, TRT_LOGGER) as parser, trt.Runtime(TRT_LOGGER) as runtime:
            config.max_workspace_size = 1 << self.max_workspace_size
            builder.max_batch_size = self.max_batch_size
            assert os.path.exists(self.onnx_file_path), print('ONNX file {} not found, please first to generate it.'.format(self.onnx_file_path))
            self.logger.info('Loading ONNX file from path {}...'.format(self.onnx_file_path))

            with open(self.onnx_file_path, 'rb') as model:
                self.logger.info('Beginning ONNX file parsing')
                if not parser.parse(model.read()):
                    self.logger.error ('ERROR: Failed to parse the ONNX file.')
                    for error in range(parser.num_errors):
                    return None

            network.get_input(0).shape = self.input_shape
            plan = builder.build_serialized_network(network, config)
            engine = runtime.deserialize_cuda_engine(plan)
            with open(self.engine_file_path, "wb") as f:
            return engine

I can successfully convert original yolox_l.onnx to TRT engine with above method. However, it returns None with the quantized model because it cannot parse the onnx and gives the error below:

[09/04/2023-10:46:19] [TRT] [E] head.cls_preds.0.bias_DequantizeLinear_dequantize_scale_node: only activation types allowed as input to this layer.
ERROR:root:ERROR: Failed to parse the ONNX file.
ERROR:root:In node 0 (parseGraph): INVALID_NODE: Invalid Node - head.cls_preds.0.bias_DequantizeLinear
head.cls_preds.0.bias_DequantizeLinear_dequantize_scale_node: only activation types allowed as input to this layer.
Traceback (most recent call last):
File "./web_server/app.py", line 45, in <module>
File "src/detector/detector.py", line 81, in initialize
self.context = self.engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

I tried to quantize the model with quantize_dynamic(), but it gives another lines of exception.

I have also changed the parameters of quantize_static(). When I set the weight and activation types to QUINT8, it gave “asymmetric quantization is not supported” error. Also, I have tried it with Quant type QOperator instead of QDQ, it then gave another error at TRT conversion phase.

Simply, I need to quantize YOLOX large model to int8 and then I need to convert the quantized model to TensorRT engine. Any help will be appreciated. Thank you in advance

Hi, Please refer to the below links to perform inference in INT8


@ AakankshaS Answering problems with random useless documentations doesn’t help at all. I’ve already read them all and I know how to quantize a model from float32 to int8. The problem here is to convert the quantized onnx model to TensorRT engine.

Any help will be appreciated. Thanks.

1 Like


Could you please try building a TensorRT engine for a quantized ONNX model using the trtexec command and check if it runs successfully?
Also, please share with us the ONNX model for better debugging.

Thank you.