TENSORRT Model using FP16 Plugins and Kernels

sohaib.arshid101 · March 7, 2019, 2:30pm

I am trying to convert my network into FP16 mode where the network can also run cuda kernel for IPluginExt in FP16 mode too for which I am using __half datatype but context.enqeue doesnt give me the right values after inference, it seems like it gives garbage values but when I use float instead of __half, the model works fine and Inference gives me correct output. My Inference Code is given below:

void doInference(IExecutionContext& context, __half* input, __half* output, int batch_size)
{
    const ICudaEngine& engine = context.getEngine();

    assert(engine.getNbBindings() == 2);
    void* buffers[2];

    const int input_bind = engine.getBindingIndex(INPUT_NAME);
    const int output_bind = engine.getBindingIndex(OUTPUT_NAME);

    cout << "Size of __half data type is : " << sizeof(__half) << endl;
    CudaSafeCall(cudaMalloc(&buffers[input_bind], INPUT_SIZE*sizeof(__half)));
    CudaSafeCall(cudaMalloc(&buffers[output_bind], OUTPUT_SIZE*sizeof(__half)));

    cudaStream_t stream;
    cudaStreamCreate(&stream);

    cout << "Address of buffer[input_bind] is " << buffers[input_bind] << endl;
    cout << "Address of buffer[output_bind] is " << buffers[output_bind] << endl;
    CudaSafeCall(cudaMemcpyAsync(buffers[input_bind], input, INPUT_SIZE*sizeof(__half), cudaMemcpyHostToDevice, stream));
    bool result = context.enqueue(batch_size, buffers, stream, nullptr);
    if (result)
        cout << "Enqeue was successful" << endl;
    
    cout << "Address of buffer[output_bind] after enqeue is " << buffers[output_bind] << endl;
    cout << "Size of buffers[output_bind] is : " << sizeof(buffers[output_bind]) << endl;
    CudaSafeCall(cudaMemcpyAsync(output, buffers[output_bind], OUTPUT_SIZE*sizeof(__half), cudaMemcpyDeviceToHost, stream));
    
    for (int i=0; i<OUTPUT_SIZE; i++)
        cout <<  fp16::__half2float(output[i]) << endl;
    
    CudaCheckError();
    cudaStreamSynchronize(stream);

    // Release stream and buffers
    cudaStreamDestroy(stream);
    cudaFree(buffers[input_bind]);
    cudaFree(buffers[output_bind]);
}

And my enqueue of IPluginExt is like this:

virtual int enqueue(int batchSize, const void* const* inputs, void** outputs, void* workspace, cudaStream_t stream) override
    {
        cout << "Address inside kernel for INPUT: " << inputs[0] << endl;
        cout << "Address inside kernel for OUTPUT: " << outputs[0] << endl;
        interp_gpu( (const __half*)inputs[0], mInputDims.d[3], mInputDims.d[2], mInputDims.d[1], batchSize, (__half *)outputs[0], b_x, b_y, b_z, stream );       
        return 0;
    }

where inter_gpu is my custom kernel which takes in __half input

NVES · March 7, 2019, 3:41pm

What version of trt are you using?

sohaib.arshid101 · March 8, 2019, 5:55am

TRT 4.0
I can not use TRT 5 because TX2 does not support it

NVES · April 22, 2019, 2:47pm

Hello,

To help us debug, can you please share a repro that demonstrate the symptoms you are seeing?

Also, before a complete repro, Engineering suggests the to check the format and other input parameters of your plugin.
you can check the parameter of “configureWithFormat”, and check the “supportsFormat” of your plugin code.
To see if there are format/datatype inconsistency.

hl2997 · April 26, 2019, 11:46pm

Hi Sohaib.arshid101, do you explicitly convert data type to __half in iplugin? I thought tensorrt would do it automatically before and after iplugin layer.

Topic		Replies	Views
TensorRT fp16 plugin GPU-Accelerated Libraries	4	2907	August 23, 2017
Plugin to convert to and from half precision within the network TensorRT	1	834	May 13, 2020
TensorRT stuck on tuning plugin in FP16 mode TensorRT	1	457	October 22, 2022
TensorRT 5.1.6 Custom plugin with fp16 issue TensorRT	6	1927	November 19, 2019
Implement Plugin Layer with support of FP16 mode TensorRT	0	1048	April 26, 2019
Tensorrt 7 - Best Practice for implementing plugin that supports both FP16 and Fp32 TensorRT	1	529	August 16, 2022
Data type for TensorRT engine created from UFF model with DataType.HALF TensorRT	2	1335	May 2, 2018
How does TensorRT handle plugin layer with FP16 mode TensorRT	0	668	April 29, 2019
TensorRT, result error in fp16 TensorRT	1	779	October 19, 2021
Different FP16 inference with tensorrt and pytorch TensorRT	5	4711	October 25, 2021

TENSORRT Model using FP16 Plugins and Kernels

Related topics