How to use the IScaleLayer as a quantization node?

Hi, I found the description in the doc about nvinfer1::IScaleLayer :

A scale layer may be used as an INT8 quantization node in a graph, if the output is constrained to INT8 and the input to FP32. Quantization rounds ties to even, and clamps to [-128, 127].

To test the feature, I compiled the code below, but TRT also needs the dynamic range. Something wrong with my code?

#include <iostream>
#include <vector>

#include "NvInfer.h"

class Logger : public nvinfer1::ILogger {
    void log(Severity severity, nvinfer1::AsciiChar const* msg) noexcept { std::cout << msg << std::endl; }
};

int main() {
    Logger logger;
    auto build = nvinfer1::createInferBuilder(logger);
    const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    auto net = build->createNetworkV2(explicitBatch);
    auto cfg = build->createBuilderConfig();
    cfg->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE, 1 << 30);
    cfg->setFlag(nvinfer1::BuilderFlag::kFP16);
    cfg->setFlag(nvinfer1::BuilderFlag::kINT8);

    std::vector<float> v_one(1, 1);
    std::vector<float> v_zero(1, 0);
    nvinfer1::Weights w_one, w_zero;
    w_one.type = nvinfer1::DataType::kFLOAT;
    w_one.count = 1;
    w_one.values = v_one.data();
    w_zero.type = nvinfer1::DataType::kFLOAT;
    w_zero.count = 1;
    w_zero.values = v_zero.data();

    auto f_in = net->addInput("input", nvinfer1::DataType::kFLOAT, {4, {1, 3, 32, 32}});
    auto f2i = net->addScale(*f_in, nvinfer1::ScaleMode::kUNIFORM, w_zero, w_one, w_one);
    f2i->setOutputType(0, nvinfer1::DataType::kINT8);
    auto i8_in = f2i->getOutput(0);

    auto l = net->addElementWise(*i8_in, *i8_in, nvinfer1::ElementWiseOperation::kPROD);
    auto i8_out = l->getOutput(0);
    
    auto i2f = net->addScale(*i8_out, nvinfer1::ScaleMode::kUNIFORM, w_zero, w_one, w_one);
    i2f->setOutputType(0, nvinfer1::DataType::kFLOAT);
    auto f_out = i2f->getOutput(0);
    net->markOutput(*f_out);

    auto ret = build->buildSerializedNetwork(*net, *cfg);

    return 0;
}

After exec, the output:

[MemUsageChange] Init CUDA: CPU +323, GPU +0, now: CPU 342, GPU 4298 (MiB)
Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32 or Bool.
4: [standardEngineBuilder.cpp::initCalibrationParams::1450] Error Code 4: Internal Error (Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.)
2: [builder.cpp::buildSerializedNetwork::607] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )

My env:

TensorRT: 8.3.0
System: QNX

Hi,

We recommend you to please try on the latest TensorRT version 8.4 GA.
Can you please try using trtexec --int8 and if you face any issue, please share with us verbose logs.
Also please share with issue repro ONNX model for better debugging.

Thank you.

Hi,

Thanks for your reply. I will try to write a parser to handle other intermediate files, so I don’t have an example for onnx. Because the Q/DQ layer is unavailable in safety mode, and the ITensor::setDynamicRange is not suitable for QAT. Does the IScaleLayer work as a quantization node?