Data type for NvDsInferCudaEngineGetFromTltModel

I am implementing my custom inference plugin and I got to the TLT model parsing part. The method used for generating the engine has this declaration:

extern "C"
bool NvDsInferCudaEngineGetFromTltModel(nvinfer1::IBuilder * const builder,
        nvinfer1::IBuilderConfig * const builderConfig,
        const NvDsInferContextInitParams * const initParams,
        nvinfer1::DataType dataType,
        nvinfer1::ICudaEngine *& cudaEngine);

I would expect to pass a dataType that corresponds to the precision used by my network, but the DeepStream code does this for INT8 (nvdsinfer_model_builder.cpp:654):

                /* modelDataType should be FLOAT for INT8 */
                modelDataType = nvinfer1::DataType::kFLOAT;

Can someone explain why?

If you read the whole code, you’ll find that different branches will be selected according to different conditions. You should pass the dataType that corresponds to the precision of your network.

if (networkMode == NvDsInferNetworkMode_INT8)
.......
if (networkMode == NvDsInferNetworkMode_FP16)
.......
if (networkMode == NvDsInferNetworkMode_FP32)

Alright, but why does the INT8 branch set the dataType to kFLOAT instead of kINT8?

    if (networkMode == NvDsInferNetworkMode_INT8)
    {
        /* Check if platform supports INT8 else use FP16 */
        if (m_Builder->platformHasFastInt8())
        {
            if (m_Int8Calibrator != nullptr)
            {
                /* Set INT8 mode and set the INT8 Calibrator */
                m_BuilderConfig->setFlag(nvinfer1::BuilderFlag::kINT8);
                m_BuilderConfig->setInt8Calibrator(m_Int8Calibrator.get());
                /* modelDataType should be FLOAT for INT8 */
                modelDataType = nvinfer1::DataType::kFLOAT;
            }
            else if (cudaEngineGetFcn != nullptr || cudaEngineGetDeprecatedFcn != nullptr)
            {
                dsInferWarning("INT8 calibration file not specified/accessible. "
                        "INT8 calibration can be done through setDynamicRange "
                        "API in 'NvDsInferCreateNetwork' implementation");
            }
            else
            {
                dsInferWarning("INT8 calibration file not specified. Trying FP16 mode.");
                networkMode = NvDsInferNetworkMode_FP16;
            }
        }
        else
        {
            dsInferWarning("INT8 not supported by platform. Trying FP16 mode.");
            networkMode = NvDsInferNetworkMode_FP16;
        }
    }

It’s designed in this way for TensorRT. Basically, you needn’t to follow the implementation details. You just need to send the value correctly. Thanks

Alright, but since I am implementing my own plugin, should I set kFLOAT when using INT8 too?

Yeah, you can refer this code to set the kFLOAT when using INT8. If you encounter any problems when you run the pipeline, you can continue to open new topic on the forum.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.