INT8 Calibration with PReLU Plugin Layer

I am trying to perform INT8 calibration of a Caffe model. I’m also using the Plugin API to provide my own instance of PReLU. My approach is built upon the sampleINT8 script provided by NVIDIA.

I create batch data according to the specifications in the documentation (i.e., I save a file in the format [N,C,H,W,Image1, Image2, …, ImageN, Label2, Label2, …, LabelN]). In my batch data, I have set all labels to be 1.

My main function looks like this

int main(int argc, char** argv)
{
 
    gNetworkName = "net"; 
    int batchSize = N;
    int firstBatch = 1;
    int nbScoreBatches = 20;

    BatchStream calibrationStream(CAL_BATCH_SIZE, NB_CAL_BATCHES);
    
    Int8EntropyCalibrator calibrator(calibrationStream, FIRST_CAL_BATCH);

    std::cout << "calibration declared" << std::endl;
    scoreModel(BATCH_SIZE, 
                firstBatch, 
                nbScoreBatches, 
                    DataType::kINT8, 
                    &calibrator, 
                    false);
}

My BatchStream class is identical to that provided by the examples. ScoreModel (called from above, body is below) makes a call to my builder TensorNet.caffeToGIEModel (below), which assigns the calibrator to the builder and (among other things) attempts to build the engine. My code is unable to get past the calle to builder->buildCudaEngine after std::cout << “Building TensorRT Engine” << std::endl.

std::pair<float, float> scoreModel(int batchSize, int firstBatch, int nbScoreBatches, DataType datatype, IInt8Calibrator* calibrator, bool quiet = false)
{

IHostMemory *gieModelStream{ nullptr };
    // bool valid = false;
    // if (gNetworkName == std::string("mnist"))
    bool valid = false;
    TensorNet tensornet;
    valid = tensornet.caffeToGIEModel("pnet_full_0.pt", "pnet_full_0.caffemodel", 
                std::vector < std::string > { OUTPUT_BLOB_CLS29,
                                            OUTPUT_BLOB_REG29,
                                            OUTPUT_BLOB_CLS41,
                                            OUTPUT_BLOB_REG41 }, 
                                            batchSize,
                                            calibrator, 
                                            gieModelStream);
}
bool TensorNet::caffeToGIEModel(const std::string& deployFile,
                                const std::string& modelFile,
                                const std::vector<std::string>& outputs,
                                unsigned int maxBatchSize,
                                IInt8Calibrator* calibrator,
                                nvinfer1::IHostMemory *&gieModelStream)

{
    IBuilder* builder = createInferBuilder(gLogger);

    INetworkDefinition* network = builder->createNetwork();
    ICaffeParser* parser = createCaffeParser();

    parser->setPluginFactory(&pluginFactory);

    DataType modelDataType = DataType::kINT8;
    if(!builder->platformHasFastInt8())
        return false;

    const IBlobNameToTensor *blobNameToTensor = parser->parse(deployFile.c_str(),
                                                              modelFile.c_str(),
                                                              *network,
                                                              modelDataType);

    assert(blobNameToTensor != nullptr);
    for (auto& s : outputs){ 
        network->markOutput(*blobNameToTensor->find(s.c_str()));
        
    }

    // Build the engine
    std::cout << maxBatchSize << std::endl;
    builder->setMaxBatchSize(maxBatchSize);
    builder->setMaxWorkspaceSize(1 << 30);
    builder->setAverageFindIterations(1);
    builder->setMinFindIterations(1);
    builder->setDebugSync(true);
    builder->setInt8Mode(true); 
    builder->setInt8Calibrator(calibrator); 
    
    std::cout << "Building TensorRT Engine" << std::endl;
    ICudaEngine* engine = builder->buildCudaEngine(*network);
    std::cout << "Engine built" << std::endl;
    assert(engine);

    network->destroy();
    parser->destroy();

    gieModelStream = engine->serialize();
    engine->destroy();
    builder->destroy();
    pluginFactory.destroyPlugin();
    // shutdownProtobufLibrary();
    return true;
}

From my debugging using cuda-gdb, it looks as though the candidate for the error is the PReLU layer. I receive the following error

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0xd64ce8

Thread 1 "prelu_plugin_IN" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 121, block (168,0,0), thread (0,0,0), device 0, sm 0, warp 48, lane 0]
0x0000000000d64cf0 in void PReLU<float>(int, int, int, float const*, float*, float const*)<<<(24160,1,1),(256,1,1)>>> ()

This is confusing because, without calibration, my engine builds fine for FP32. My understanding from @ChrisGottbrath https://devtalk.nvidia.com/default/topic/1015387/tensorrt/tensorrt-fails-to-build-fasterrcnn-gie-model-with-using-int8/post/5174436/#5174436 is that the Plugin layer should be left as an FP32 procedure.

Does anyone know why my PReLU layer is causing this to occur?

I also face similar issue. I think you already konw that Nvidia sample plugin source code has this line.
“assert(mBiasWeights.type == DataType::kFLOAT || mBiasWeights.type == DataType::kHALF);”
I guess the algorithm which include plugin layer can’t run on INT8 mode. If it can, upper code is not necessary.

Although not plugin issue, I saw a similar issue on other layer.
I tried "power’ layer(which is in official caffe) on INT8 mode, and it printed below log.

caffe/caffeParser.cpp:809: nvinfer1::ILayer* parsePower(nvinfer1::INetworkDefinition&, const ditcaffe::LayerParameter&, CaffeWeightFactory&, BlobNameToTensor&): Assertion `dataType == DataType::kFLOAT || dataType == DataType::kHALF’ failed.

In my thinking, INT8 mode can run on only limited environment. If you have other opinions, please add your comment.
I also strongly hope NVIDIA engineers clean this confusion.

Bumping this up: I believe that this is a genuine issue with the TensorRT plugin API’s compatibility with INT8 conversions. I have been testing INT8 conversions for other nets that I’m interested in and found no issue whatsoever. However, when converting nets with PReLU plugin layers, I’ve found that they won’t convert. The PReLU layer functions correctly without FP32 implemented.

I would REALLY appreciate ANY input from the NVIDIA dev team.

TensorRT plugin API is slow in speed contrast with the supported layers of TensorRT? I found my net does not speed up with the PReLU plugin layers contrast with the supported ReLU layers.