Build yolov8 QAT model int8 engine failed

Build yolov8 QAT model int8 engine failed

I am training a QAT model, with ultralytic framework,
yolov8-seg model
after export to deploy_model.onnx, and dynamic_range.json, I use python tensorrt API to build the engine. the code is


import onnx
import pycuda.autoinit  # noqa F401
import tensorrt as trt
import json
import os
import numpy as np
import argparse

def onnx2trt(onnx_model,
             trt_path,
            #  dataset_path,
             batch_size=1,
             cali_batch=10,
             log_level=trt.Logger.ERROR,
             max_workspace_size=1 << 30,
             device_id=0,
             mode='fp32',
             is_explicit=False,
             dynamic_range_file=None):
    if os.path.exists(trt_path):
        print(f'The "{trt_path}" exists. Remove it and continue.')
        os.remove(trt_path)


    # create builder and network
    logger = trt.Logger(log_level)
    builder = trt.Builder(logger)
    EXPLICIT_BATCH = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(EXPLICIT_BATCH)

    parser = trt.OnnxParser(network, logger)

    if isinstance(onnx_model, str):
        onnx_model = onnx.load(onnx_model)

    if not parser.parse(onnx_model.SerializeToString()):
        error_msgs = ''
        for error in range(parser.num_errors):
            error_msgs += f'{parser.get_error(error)}\n'
        raise RuntimeError(f'parse onnx failed:\n{error_msgs}')

    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, max_workspace_size)

    if mode == 'int8':
        config.set_flag(trt.BuilderFlag.INT8)
        if dynamic_range_file:
            with open(dynamic_range_file, 'r') as f:
                dynamic_range = json.load(f)['tensorrt']['blob_range']

            for input_index in range(network.num_inputs):
                input_tensor = network.get_input(input_index)
                if input_tensor.name in dynamic_range:
                    print("input_tensor.name", input_tensor.name)
                    amax = dynamic_range[input_tensor.name]
                    input_tensor.dynamic_range = (-amax, amax)
                    print(f'Set dynamic range of {input_tensor.name} as [{-amax}, {amax}]')

            for layer_index in range(network.num_layers):
                layer = network[layer_index]
                output_tensor = layer.get_output(0)
                print("output_tensor.name", output_tensor.name)
                if output_tensor.name in dynamic_range:
                    amax = dynamic_range[output_tensor.name]
                    output_tensor.dynamic_range = (-amax, amax)
                    print(f'Set dynamic range of {output_tensor.name} as [{-amax}, {amax}]')

    # create engine
    serialized_engine = builder.build_engine(network, config)
    # import ipdb
    # ipdb.set_trace()

    # 
    with open(trt_path, "wb") as f:
        f.write(serialized_engine)

    # 
    runtime = trt.Runtime(logger)
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    return engine

my machine env is : tensorRT 8.6, cuda11.4

so the error happens when building the engine:


[07/24/2025-17:28:53] [TRT] [V] *************** Autotuning Reformat: Float(1:4,1600,40,1) -> Int8(9600,1600:32,40,1) ***************
[07/24/2025-17:28:53] [TRT] [V] --------------- Timing Runner: /Slice_8_output_0 copy (Reformat[0x80000006])
[07/24/2025-17:28:53] [TRT] [V] Skipping tactic 0x00000000000003e8 due to exception an illegal memory access was encountered
[07/24/2025-17:28:53] [TRT] [V] /Slice_8_output_0 copy (Reformat[0x80000006]) profiling completed in 0.00831625 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[07/24/2025-17:28:53] [TRT] [V] --------------- Timing Runner: /Slice_8_output_0 copy (MyelinReformat[0x80000035])
[07/24/2025-17:28:53] [TRT] [V] MyelinReformat has no valid tactics for this config, skipping
[07/24/2025-17:28:53] [TRT] [V] Deleting timing cache: 3616 entries, served 4867 hits since creation.
[07/24/2025-17:28:53] [TRT] [E] 2: Impossible to reformat.
[07/24/2025-17:28:53] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[07/24/2025-17:28:53] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)

[07/24/2025-17:28:54] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[07/24/2025-17:28:54] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[07/24/2025-17:28:54] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[07/24/2025-17:28:54] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[07/24/2025-17:28:54] [TRT] [E] 2: [optimizer.cpp::computeCosts::4194] Error Code 2: Internal Error (Impossible to reformat.)

Environment

TensorRT Version: 8.6
GPU Type: 3080TI
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version:
Operating System + Version: 20.04
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.7.1
Baremetal or Container (if container which image + tag):

Hi @294116967 ,
Possible Causes and Solutions

  1. Quantization Flags: Make sure you’re using the correct quantization flags when building the engine. You can try setting the CALIBRATE_BEFORE_FUSION flag to True using the tensorrt.QuantizationFlag component. This flag is disabled by default, but it might be necessary for your specific use case.
  2. Device Type: Verify that you’re using the correct device type when building the engine. You can try setting the device_type parameter to GPU using the tensorrt.DeviceType component.
  3. Profiling Verbosity: Adjust the profiling verbosity to get more detailed information about the engine building process. You can try setting the profiling_verbosity parameter to DETAILED using the tensorrt.ProfilingVerbosity component.
  4. Model Compatibility: Ensure that your QAT model is compatible with TensorRT 8.6. You can try checking the TensorRT documentation for any specific requirements or restrictions on QAT models.
  5. ONNX and Dynamic Range JSON Files: Verify that the ONNX and dynamic range JSON files are correctly exported and formatted. You can try checking the files for any errors or inconsistencies.
    Thanks

Hi , please consider teh below pointers -

  1. Make sure you’re using the correct quantization flags when building the engine. You can try setting the CALIBRATE_BEFORE_FUSION flag to True using the tensorrt.QuantizationFlag component. This flag is disabled by default, but it might be necessary for your specific use case.
  2. Verify that you’re using the correct device type when building the engine. You can try setting the device_type parameter to GPU using the tensorrt.DeviceType component.
  3. Adjust the profiling verbosity to get more detailed information about the engine building process. You can try setting the profiling_verbosity parameter to DETAILED using the tensorrt.ProfilingVerbosity component.
  4. Ensure that your QAT model is compatible with TensorRT 8.6. You can try checking the TensorRT documentation for any specific requirements or restrictions on QAT models.
  5. Verify that the ONNX and dynamic range JSON files are correctly exported and formatted. You can try checking the files for any errors or inconsistencies.

However if the issue persist, I woul drequest you to please share your model and repro script with us.

Thanks