How to use the IScaleLayer as a quantization node?

oPlusss · June 20, 2022, 8:50am

Hi, I found the description in the doc about nvinfer1::IScaleLayer :

A scale layer may be used as an INT8 quantization node in a graph, if the output is constrained to INT8 and the input to FP32. Quantization rounds ties to even, and clamps to [-128, 127].

To test the feature, I compiled the code below, but TRT also needs the dynamic range. Something wrong with my code?

#include <iostream>
#include <vector>

#include "NvInfer.h"

class Logger : public nvinfer1::ILogger {
    void log(Severity severity, nvinfer1::AsciiChar const* msg) noexcept { std::cout << msg << std::endl; }
};

int main() {
    Logger logger;
    auto build = nvinfer1::createInferBuilder(logger);
    const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    auto net = build->createNetworkV2(explicitBatch);
    auto cfg = build->createBuilderConfig();
    cfg->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE, 1 << 30);
    cfg->setFlag(nvinfer1::BuilderFlag::kFP16);
    cfg->setFlag(nvinfer1::BuilderFlag::kINT8);

    std::vector<float> v_one(1, 1);
    std::vector<float> v_zero(1, 0);
    nvinfer1::Weights w_one, w_zero;
    w_one.type = nvinfer1::DataType::kFLOAT;
    w_one.count = 1;
    w_one.values = v_one.data();
    w_zero.type = nvinfer1::DataType::kFLOAT;
    w_zero.count = 1;
    w_zero.values = v_zero.data();

    auto f_in = net->addInput("input", nvinfer1::DataType::kFLOAT, {4, {1, 3, 32, 32}});
    auto f2i = net->addScale(*f_in, nvinfer1::ScaleMode::kUNIFORM, w_zero, w_one, w_one);
    f2i->setOutputType(0, nvinfer1::DataType::kINT8);
    auto i8_in = f2i->getOutput(0);

    auto l = net->addElementWise(*i8_in, *i8_in, nvinfer1::ElementWiseOperation::kPROD);
    auto i8_out = l->getOutput(0);
    
    auto i2f = net->addScale(*i8_out, nvinfer1::ScaleMode::kUNIFORM, w_zero, w_one, w_one);
    i2f->setOutputType(0, nvinfer1::DataType::kFLOAT);
    auto f_out = i2f->getOutput(0);
    net->markOutput(*f_out);

    auto ret = build->buildSerializedNetwork(*net, *cfg);

    return 0;
}

After exec, the output:

[MemUsageChange] Init CUDA: CPU +323, GPU +0, now: CPU 342, GPU 4298 (MiB)
Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32 or Bool.
4: [standardEngineBuilder.cpp::initCalibrationParams::1450] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.)
2: [builder.cpp::buildSerializedNetwork::607] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )

My env:

TensorRT: 8.3.0
System: QNX

spolisetty · June 21, 2022, 2:32pm

Hi,

We recommend you to please try on the latest TensorRT version 8.4 GA.
Can you please try using trtexec --int8 and if you face any issue, please share with us verbose logs.
Also please share with issue repro ONNX model for better debugging.

Thank you.

oPlusss · July 7, 2022, 3:29am

Hi,

Thanks for your reply. I will try to write a parser to handle other intermediate files, so I don’t have an example for onnx. Because the Q/DQ layer is unavailable in safety mode, and the ITensor::setDynamicRange is not suitable for QAT. Does the IScaleLayer work as a quantization node?

Topic		Replies	Views
Converting to TRT a model from Quantization Aware Training without applying calibration TensorRT	5	1609	February 2, 2021
TensorRT conversion issues of ONNX model trained with Quantization Aware Training + custom quantization scale TensorRT tensorrt	5	1356	April 14, 2021
YOLOX - Quantize int8 and convert to TensorRT engine TensorRT tensorrt , jetson-inference , python	3	1823	September 8, 2023
TensorRT8 INT8 (signed char) I/O interface for ONNX model TensorRT tensorrt , onnx	4	1350	February 15, 2022
Problem to quantize the INT8 model TensorRT tensorrt	2	986	February 15, 2022
Why while ONNX-TensorRT conversion with INT8 quantizations some layers are not quantized? TensorRT tensorrt , pytorch , onnx	12	2608	December 4, 2022
TensorRT quantization uses int8 or uint8 TensorRT tensorrt	1	810	June 6, 2023
Post-Training Quantization (PTQ) for semantic segmentation model running on Jetson Orin NX Jetson Orin NX tensorrt	10	45	December 24, 2024
Problem with converting ONNX quantized models to TensorRT Jetson AGX Xavier tensorrt , onnx	6	1522	December 22, 2021
TensorRT the inference is slow for the QAT model comparing to the PTQ case Jetson AGX Xavier tensorrt , nvbugs	19	1542	January 16, 2023

How to use the IScaleLayer as a quantization node?

Related topics