UFF parser int8 problem

Does the latest versions of TensorRT support loading FP32-trained tensorflow model for INT8 inference ?

When trying to load the UFF model with

parser->parse(uff_file_name.c_str(), *network, nvinfer1::DataType::kINT8)

I get an error like this:

UFFParser: Parser error: dense_7/kernel: Invalid weights types when converted. Trying to convert from FP32 To INT8

I thought this maybe a problem with my model but I have tried many tensorflow frozen graphs (*.pb) and converted them with convert-to-uff and encountered the same issue in all. However, FLOAT and HALF precision inference work without a problem though.

My TensorRT version is 5.0.0.10.

hello,

You’ll need to calibrate the int8 input first. INT8 engines are built from 32-bit network definitions and require significantly more investment than building a 32-bit or 16-bit engine. In particular, the TensorRT builder must perform a process called calibration to determine how best to represent the weights and activations as 8-bit integers.

Please see this example: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#int8_sample

Yes, I understand that but the problem does not occur during inference but loading of the UFF model.
I am actually following the sampleInt8API example which avoids uses of the calibration step.
Parsing the UFF model fails before the code setting layer precision etcs… I also tried moving that part of the code before the network is parsed to no avail.

IBuilder* builder = createInferBuilder(logger);
    INetworkDefinition* network = builder->createNetwork();  
    if(!parser->parse(uff_file_name.c_str(), *network, floatMode)) {  //problem occurs here
        std::cout << "Fail to parse network " << uff_file_name << std::endl;;
        return;
    }     
    if (floatMode == nvinfer1::DataType::kHALF) {
        builder->setHalf2Mode(true);
    } else if (floatMode == nvinfer1::DataType::kINT8) {
        builder->setInt8Mode(true);
        builder->setInt8Calibrator(nullptr);
        builder->setStrictTypeConstraints(true);

        setLayerPrecision(network);
        setDynamicRange(network);
    }
    builder->setMaxBatchSize(BATCH_SIZE);
    builder->setMaxWorkspaceSize((1 << 30));
    engine = builder->buildCudaEngine(*network);
    if (!engine) {
        std::cout << "Unable to create engine" << std::endl;
        return;
    }

A similar issue mentioned here https://devtalk.nvidia.com/default/topic/1029476/tensorrt/tensorrt-python-interface-uff-int8-calibration-issue/
concluded the problem is with tensorflow-int8 path that is not tested well. I was wondering if that has changed with the newer version of tensorrt.

Edit

It looks like I have to parse with the precision that the network is trained with.
Please confirm this.

I did this before:

parser->parse(uff_file_name.c_str(), *network, nvinfer1::DataType::kINT8)

but I should use FLOAT to load the mode even when I am going to infer wiht int8

parser->parse(uff_file_name.c_str(), *network, nvinfer1::DataType::kFLOAT)

hi, I met the same problem, however, I use python api, I would like to know if tensorrt 6 support to transform uff to int8 mode?