TensorRT 5.1.6 Custom plugin with fp16 issue

klassesnvidia · October 31, 2019, 6:40am

I am having issues with getting custom plugin to use fp16. The uffparser will have segmentation fault when

builder->setFp16Mode(true);

as I can’t get the data type correctly in ‘configurePlugin’ function (e.g. nvinfer1::DataType::kHALF).

Is there any guide / article that I can look at to solve this problem?

There is some discussion regarding the issue but there isn’t any answer to it,
https://devtalk.nvidia.com/default/topic/1044156/tensorrt/does-tensor-rt-5-automatically-enable-tensor-core-for-int8-and-fp16-mode-/post/5333563/#5333563

For the sake of clarity, we’ve successfully implemented a custom TRT plugin with FP32, but are stuck with regards to the FP16 implementation because TRT crashes when building from the .uff.

SunilJB · November 12, 2019, 8:20am

Hi,
It is recommended to always provide an FP32 implementation of the plugin in order to allow the plugin to properly operate with any network.

Thanks

klassesnvidia · November 12, 2019, 12:15pm

I’m afraid you’re not answering the qn at all. Here’s the question in simple terms:

Is FP16 supported for TensorRT 5.1.6.0 custom plugins or not?

If so, then what’s the proper way to implement FP16 for custom plugins?

Again, we’ve already implemented an FP32 version of the plugin. We are now looking to implement an FP16 version.

SunilJB · November 13, 2019, 3:24am

Hi,
Yes, FP16 is supported in TRT 5.X for custom plugin.
Implementation:
With IPluginV2/IPluginV2Ext, implement the supportsFormat API and return true if DataType == FP16 for TRT 5.X:
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/samplePlugin/fcPlugin.h
Something like:
bool supportsFormat(DataType type, PluginFormat format) const override
{
return (type == DataType::kFLOAT || type == DataType::kHALF) &&
format == PluginFormat::kNCHW;
}
Or with IPluginV2DynamicExt/IPluginV2IOExt, using the supportsFormatCombinations api, as shown here for TRT 6.0:
https://github.com/NVIDIA/TensorRT/blob/master/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cpp

Support matrix (hardware) for TRT 5.x:
TensorRT Support Matrix :: Deep Learning SDK Documentation

But I will recommend you to use the latest TRT version in order to get more flexibility for custom plugin creation.

Thanks

klassesnvidia · November 14, 2019, 10:07am

Hi,

We have followed as per your guide in the modification of the codes, but we have noticed that inputTypes passed in to configurePlugin always seems to be nvinfer1::DataType::kFLOAT.

bool CustomPlugin::supportsFormat(nvinfer1::DataType type, PluginFormat format) const {
    return ((type == nvinfer1::DataType::kFLOAT || type == nvinfer1::DataType::kHALF)
        && (format == PluginFormat::kNCHW));
}

void CustomPlugin::configurePlugin(const nvinfer1::Dims* inputDims, int nbInputs, const nvinfer1::Dims* outputDims, int nbOutputs,
    const nvinfer1::DataType* inputTypes, const nvinfer1::DataType* outputTypes, const bool* inputIsBroadcast,
    const bool* outputIsBroadcast, PluginFormat floatFormat, int maxBatchSize) {
    ASSERT(*inputTypes == nvinfer1::DataType::kFLOAT || *inputTypes == nvinfer1::DataType::kHALF);
    ASSERT(*inputTypes == *outputTypes);
    ASSERT(floatFormat == PluginFormat::kNCHW);
    ...
    data_type = *inputTypes;
    ...

}

While doing the conversion from .uff file with FP16 mode enabled, it crashes somewhere in the TensorRT library. The callstack is given below:

(cuda-gdb) info stack
#0  0x000000005f351e70 in ?? ()
#1  0x00007fffef41b6d3 in nvinfer1::cudnn::selectTactic(nvinfer1::rt::EngineBuildContext const&, nvinfer1::rt::Layer&, nvinfer1::builder::Node*) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#2  0x00007fffef42f556 in nvinfer1::builder::buildSingleLayer(nvinfer1::rt::EngineBuildContext&, nvinfer1::builder::Node&, std::unordered_map<std::string, std::unique_ptr<nvinfer1::rt::Region, std::default_delete<nvinfer1::rt::Region> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::unique_ptr<nvinfer1::rt::Region, std::default_delete<nvinfer1::rt::Region> > > > > const&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) () from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#3  0x00007fffef43098e in nvinfer1::builder::EngineTacticSupply::getBestTactic(nvinfer1::builder::Node&, nvinfer1::query::Ports<nvinfer1::RegionFormatL> const&, bool) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#4  0x00007fffef4688d1 in nvinfer1::builder::makeReformattingTensor(nvinfer1::builder::Tensor&, std::string const&, nvinfer1::RegionFormatL const&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#5  0x00007fffef4637fa in nvinfer1::builder::chooseFormatsAndTactics(nvinfer1::builder::Graph&, nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#6  0x00007fffef4328a6 in nvinfer1::builder::EngineTacticSupply::timeReformat(std::shared_ptr<nvinfer1::builder::Tensor>&, bool, nvinfer1::RegionFormatL const&, nvinfer1::RegionFormatL const&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*) () from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#7  0x00007fffef437212 in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::rt::HardwareContext const&, nvinfer1::Network const&) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#8  0x00007fffef4cce5d in nvinfer1::builder::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) () from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#9  0x000000000044fedf in klassfr::trt::Model<klassfr::trt::IMTCNN_PNet>::InitializeFromUff (this=0x1117e60, uff_stream=..., max_batch_size=4)
    at /home/klass/tensor_rt/src/model.hpp:116

Even if we modify the enqueue function to do nothing (e.g. no modification of input/outputs ), it still crashes in the same place as described above.

Truncated logs from stdout shown below:

[INFO] --------------- Timing custom1(34)
[INFO] Tactic 0 time 3.78656
Segmentation fault (core dumped)

OR

[INFO] --------------- Timing custom1(34)
[INFO] Tactic 0 time 3.77376
Bus error (core dumped)

SunilJB · November 18, 2019, 5:07am

Hi,

Along with “supportsFormat” function, we have to make changes in below two functions as well to support fp16:

configureWithFormat() : The builder selects a configuration with the networks configureWithFormat() method, to give it a chance to select an algorithm based on its inputs
enqueue() : The core of the plugin is enqueue, which is used to execute the custom layer at runtime.

Please refer below sample for details:

if (builder->platformHasFastFp16()) { … }; this API may also help to debug. It gives whether the platform has fast native fp16.

Also, can you provide the following information so we can better help?

Software platform details
nvidia-smi info
Sample code to reproduce the issue.

Support matrix (hardware) for TRT 5.x:

Thanks

klassesnvidia · November 19, 2019, 10:43am

Hi,

We have tested that IPluginV2 supports fp16 with your guide in the modification of the code. However, there is still no luck when using IPluginV2Ext.

We will stick to IPluginV2 for now.

Thanks

Topic		Replies	Views
Tensorrt 7 - Best Practice for implementing plugin that supports both FP16 and Fp32 TensorRT	1	506	August 16, 2022
Implement Plugin Layer with support of FP16 mode TensorRT	0	1034	April 26, 2019
TensorRT stuck on tuning plugin in FP16 mode TensorRT	1	433	October 22, 2022
TENSORRT Model using FP16 Plugins and Kernels TensorRT	4	1120	April 26, 2019
Failed to convert engine to fp16, by using setFp16Mode TensorRT tensorrt	3	1159	October 20, 2020
TensorRT, result error in fp16 TensorRT	1	756	October 19, 2021
TensorRT fp16 plugin GPU-Accelerated Libraries	4	2829	August 23, 2017
How does TensorRT handle plugin layer with FP16 mode TensorRT	0	656	April 29, 2019
Plugin to convert to and from half precision within the network TensorRT	2	807	October 12, 2021
How to quantity tensorRT plugin TensorRT tensorrt	3	552	April 16, 2021

TensorRT 5.1.6 Custom plugin with fp16 issue

Related topics