TensorRT 5.1.6 Custom plugin with fp16 issue

I am having issues with getting custom plugin to use fp16. The uffparser will have segmentation fault when

builder->setFp16Mode(true);

as I can’t get the data type correctly in ‘configurePlugin’ function (e.g. nvinfer1::DataType::kHALF).

Is there any guide / article that I can look at to solve this problem?

There is some discussion regarding the issue but there isn’t any answer to it,
https://devtalk.nvidia.com/default/topic/1044156/tensorrt/does-tensor-rt-5-automatically-enable-tensor-core-for-int8-and-fp16-mode-/post/5333563/#5333563

For the sake of clarity, we’ve successfully implemented a custom TRT plugin with FP32, but are stuck with regards to the FP16 implementation because TRT crashes when building from the .uff.

Hi,
It is recommended to always provide an FP32 implementation of the plugin in order to allow the plugin to properly operate with any network.

Thanks

I’m afraid you’re not answering the qn at all. Here’s the question in simple terms:

Is FP16 supported for TensorRT 5.1.6.0 custom plugins or not?

If so, then what’s the proper way to implement FP16 for custom plugins?

Again, we’ve already implemented an FP32 version of the plugin. We are now looking to implement an FP16 version.

Hi,
Yes, FP16 is supported in TRT 5.X for custom plugin.
Implementation:
With IPluginV2/IPluginV2Ext, implement the supportsFormat API and return true if DataType == FP16 for TRT 5.X:
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/samplePlugin/fcPlugin.h
Something like:
bool supportsFormat(DataType type, PluginFormat format) const override
{
return (type == DataType::kFLOAT || type == DataType::kHALF) &&
format == PluginFormat::kNCHW;
}
Or with IPluginV2DynamicExt/IPluginV2IOExt, using the supportsFormatCombinations api, as shown here for TRT 6.0:
https://github.com/NVIDIA/TensorRT/blob/master/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cpp

Support matrix (hardware) for TRT 5.x:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-515/tensorrt-support-matrix/index.html#hardware-precision-matrix

But I will recommend you to use the latest TRT version in order to get more flexibility for custom plugin creation.

Thanks

Hi,

We have followed as per your guide in the modification of the codes, but we have noticed that inputTypes passed in to configurePlugin always seems to be nvinfer1::DataType::kFLOAT.

bool CustomPlugin::supportsFormat(nvinfer1::DataType type, PluginFormat format) const {
    return ((type == nvinfer1::DataType::kFLOAT || type == nvinfer1::DataType::kHALF)
        && (format == PluginFormat::kNCHW));
}

void CustomPlugin::configurePlugin(const nvinfer1::Dims* inputDims, int nbInputs, const nvinfer1::Dims* outputDims, int nbOutputs,
    const nvinfer1::DataType* inputTypes, const nvinfer1::DataType* outputTypes, const bool* inputIsBroadcast,
    const bool* outputIsBroadcast, PluginFormat floatFormat, int maxBatchSize) {
    ASSERT(*inputTypes == nvinfer1::DataType::kFLOAT || *inputTypes == nvinfer1::DataType::kHALF);
    ASSERT(*inputTypes == *outputTypes);
    ASSERT(floatFormat == PluginFormat::kNCHW);
    ...
    data_type = *inputTypes;
    ...

}

While doing the conversion from .uff file with FP16 mode enabled, it crashes somewhere in the TensorRT library. The callstack is given below:

(cuda-gdb) info stack
#0  0x000000005f351e70 in ?? ()
#1  0x00007fffef41b6d3 in nvinfer1::cudnn::selectTactic(nvinfer1::rt::EngineBuildContext const&, nvinfer1::rt::Layer&, nvinfer1::builder::Node*) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#2  0x00007fffef42f556 in nvinfer1::builder::buildSingleLayer(nvinfer1::rt::EngineBuildContext&, nvinfer1::builder::Node&, std::unordered_map<std::string, std::unique_ptr<nvinfer1::rt::Region, std::default_delete<nvinfer1::rt::Region> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::unique_ptr<nvinfer1::rt::Region, std::default_delete<nvinfer1::rt::Region> > > > > const&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) () from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#3  0x00007fffef43098e in nvinfer1::builder::EngineTacticSupply::getBestTactic(nvinfer1::builder::Node&, nvinfer1::query::Ports<nvinfer1::RegionFormatL> const&, bool) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#4  0x00007fffef4688d1 in nvinfer1::builder::makeReformattingTensor(nvinfer1::builder::Tensor&, std::string const&, nvinfer1::RegionFormatL const&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#5  0x00007fffef4637fa in nvinfer1::builder::chooseFormatsAndTactics(nvinfer1::builder::Graph&, nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#6  0x00007fffef4328a6 in nvinfer1::builder::EngineTacticSupply::timeReformat(std::shared_ptr<nvinfer1::builder::Tensor>&, bool, nvinfer1::RegionFormatL const&, nvinfer1::RegionFormatL const&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*) () from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#7  0x00007fffef437212 in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::rt::HardwareContext const&, nvinfer1::Network const&) ()
   from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#8  0x00007fffef4cce5d in nvinfer1::builder::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) () from /home/klass/nvidia/tensorrt-5.1.5.0/x86_64-linux-gnu/lib/libnvinfer.so.5
#9  0x000000000044fedf in klassfr::trt::Model<klassfr::trt::IMTCNN_PNet>::InitializeFromUff (this=0x1117e60, uff_stream=..., max_batch_size=4)
    at /home/klass/tensor_rt/src/model.hpp:116

Even if we modify the enqueue function to do nothing (e.g. no modification of input/outputs ), it still crashes in the same place as described above.

Truncated logs from stdout shown below:

[INFO] --------------- Timing custom1(34)
[INFO] Tactic 0 time 3.78656
Segmentation fault (core dumped)

OR

[INFO] --------------- Timing custom1(34)
[INFO] Tactic 0 time 3.77376
Bus error (core dumped)

Hi,

Along with “supportsFormat” function, we have to make changes in below two functions as well to support fp16:

  • configureWithFormat() : The builder selects a configuration with the networks configureWithFormat() method, to give it a chance to select an algorithm based on its inputs
  • enqueue() : The core of the plugin is enqueue, which is used to execute the custom layer at runtime.

Please refer below sample for details:
https://github.com/NVIDIA/TensorRT/tree/release/6.0/samples/opensource/samplePlugin

if (builder->platformHasFastFp16()) { … }; this API may also help to debug. It gives whether the platform has fast native fp16.

Also, can you provide the following information so we can better help?

  • Software platform details
  • nvidia-smi info
  • Sample code to reproduce the issue.

Support matrix (hardware) for TRT 5.x:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-515/tensorrt-support-matrix/index.html#hardware-precision-matrix

Thanks

Hi,

We have tested that IPluginV2 supports fp16 with your guide in the modification of the code. However, there is still no luck when using IPluginV2Ext.

We will stick to IPluginV2 for now.

Thanks