Is there any method to build model with int8 weight in tensorrt?


I’m trying to build engine with int8 weights

    config = builder.create_builder_config()
    network = builder.create_network()

    input_tensor = network.add_input(name='input', dtype=trt.int8, shape=(1,3,720,1280))

    # conv1_w & conv1_b are numpy array with dtype=int8
    conv1_w = weights[0]
    conv1_b = weights[1]
    conv1 = network.add_convolution(input=input_tensor, num_output_maps=64, kernel_shape=(5, 5), kernel=conv1_w, bias=conv1_b)
    conv1.stride = (1, 1)
    conv1.padding = (2, 2)

    # and so on..

    plan = builder.build_serialized_network(network, config)

And I’ve encountered this error

[TensorRT] WARNING: Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[TensorRT] ERROR: 4: [standardEngineBuilder.cpp::initCalibrationParams::2050] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.)
[TensorRT] ERROR: 2: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)

As far as I understood, building serialized network with int8 option requires FP32 model and calibrator or QAT, and building engine without calibration is impossible.

Is there any way to work around this? or, is it possible to implement any kind of calibration (nonlinear, nonuniform, etc.) within tensorrt python?



TensorRT Version:
GPU Type: NVIDIA 2080 ti
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8.2.1
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
Baremetal or Container (if container which image + tag): baremetal

Hi, Please refer to the below links to perform inference in INT8