Problem deserializing custom plugin on Jetson Nano

fbrughi · May 18, 2020, 3:09pm

Hi,
We implemented a custom plugin of type IPluginV2;
The engine serialization runs smoothly, but when we try to deserialize to run inference we get the folowing error

ERROR: [TRT]: INVALID_ARGUMENT: getPluginCreator could not find plugin BoxDecoding_TRT version 1
ERROR: [TRT]: safeDeserializationUtils.cpp (321) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
ERROR: [TRT]: INVALID_STATE: std::exception
ERROR: [TRT]: INVALID_CONFIG: Deserialize the cuda engine failed.

An important detail is that it only happens on the Jetson Nano, we have been trying on other platforms and never had any problem (2080ti, T4).

Tried with both TensorRT 6.0.1.5 and TensorRT 7.0.0.11 and the outcome is the same: not working on Jetson Nano, OK on other GPUs

In detail:

CUDA 10.1 - TensorRT 6.0.1.5 - Jetson Nano: error
CUDA 10.1 - TensorRT 6.0.1.5 - 2080ti: OK
CUDA 10.2 - TensorRT 7.0.0.11 - Jetson Nano: error
CUDA 10.2 - TensorRT 7.0.0.11 - 2080ti: OK

This has been tested with both FP16 and INT8 precision, the result is the same;
Also, when initializing the plugins with the function initLibNvInferPlugins, it always returns true.

Is this any known behavior?
Is there something that has to be taken into account when writing custom plugins for Jetson Nano platflorms?

Thanks,

f

SunilJB · May 18, 2020, 7:46pm

Hi,

Is the custom plugin “BoxDecoding_TRT” build on the Jetson Nano?
Also, please refer to below issues in case it helps:

github.com/NVIDIA/TensorRT

Custom Plugin: INVALID_ARGUMENT: getPluginCreator could not find plugin Prelu_TRT version 1

opened 07:32AM - 10 Dec 19 UTC

closed 11:14AM - 10 Dec 19 UTC

zerollzeng

Component: Plugins API: C++ Release: 6.x

## Description ## Environment **TensorRT Version**: 6.0.1.5 **GPU T…ype**: 2080ti **Nvidia Driver Version**:418.39 **CUDA Version**: 10.0 **CUDNN Version**: 7.6 **Operating System + Version**:ubuntu 16.04 **Python Version (if applicable)**: **TensorFlow Version (if applicable)**: **PyTorch Version (if applicable)**: **Baremetal or Container (if container which image + tag)**: ## Relevant Files https://github.com/zerollzeng/tiny-tensorrt/tree/master/plugin/PreluPlugin ## Steps To Reproduce you can refer to readme of tensorrt-zoo(https://github.com/zerollzeng/tensorrt-zoo), but it might take a long times. so let me describe it simple here: I am testing a openpose demo since it need a custom prelu layer, so I write one, at the end of the PreluPlugin.cu I add REGISTER_TENSORRT_PLUGIN(PreluPluginCreator); then I run sample which create an engine file and the produce correct result, but when I try to deserialize from engine file, it report error: INVALID_ARGUMENT: getPluginCreator could not find plugin Prelu_TRT version 1 safeDeserializationUtils.cpp (259) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) INVALID_STATE: std::exception INVALID_CONFIG: Deserialize the cuda engine failed. can you give some advise or hints about this error? I google and search in developer forum but I didn't find information that is helpful to me. thanks

If issue persist, could you please share the sample code to reproduce the issue and the verbose log, so we can better help

Thanks

fbrughi · May 19, 2020, 1:51pm

hi,
thanks for replying;
yes we had already tested the solution provided in that forum, without success.

The implementation of the plugin in question is in the code below;
The whole project is pretty large and I’m afraid I cannot disclose the architecture implementation;
As mentioned it always worked on GPUs, with both TensorRT 6 and 7;

Hope these details help.
Thanks again,

f

#include <string.h> 
#include <iostream>
#include <cassert>
#include <vector>
#include <functional>
#include <numeric>
#include <algorithm>
//#include "NvInfer.h"
#include "NvInferPlugin.h"
#include "cuda_runtime_api.h"


void boxDecoderLauncher(const int batch_size, const int *map_size, const float thr,
                        const float *data_in_l, const float *data_in_b, const float *data_in_l_pool, float *data_out_b, float *data_out_s, cudaStream_t stream);

using namespace nvinfer1;


class BoxDecodingLayer : public IPluginV2
{
public:
    BoxDecodingLayer(const float score_threshold, const int map_height, const int map_width, const int map_depth)
    {
        scoreThreshold = score_threshold;
        mapHeight = map_height;
        mapWidth = map_width;
        mapDepth = map_depth;
    }

    BoxDecodingLayer(const void* data, size_t length)
    {
        const char* d = static_cast<const char*>(data);
        scoreThreshold = read<float>(d);
        mapHeight = read<int>(d);
        mapWidth = read<int>(d);
        mapDepth = read<int>(d);
    }

    // It makes no sense to construct UffPoolPluginV2 without arguments.
    BoxDecodingLayer() = delete;

    virtual ~BoxDecodingLayer() {}

    int getNbOutputs() const override
    {
        return 2;
    }

    Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) override
    {
        assert(nbInputDims == 3);
        assert(inputs[0].nbDims == 3);
        assert(inputs[1].nbDims == 3);
        assert(inputs[2].nbDims == 3);

        if (index == 0) // boxes
            return DimsCHW(mapHeight * mapWidth, mapDepth, 4);

        if (index == 1) // scores
            return Dims2(mapHeight * mapWidth, mapDepth);

        return DimsCHW(mapHeight * mapWidth, mapDepth, 4);
    }

    int initialize() override { return 0; }

    void terminate() override { ; }

    size_t getWorkspaceSize(int maxBatchSize) const override { return 0; }

    int enqueue(int batch_size, const void*const *inputs, void** outputs, void*, cudaStream_t stream) override
    {
        int map_size[] {mapHeight, mapWidth, mapDepth};
        float thr = scoreThreshold;
        float *data_in_l = (float*)inputs[0];
        float *data_in_b = (float*)inputs[1];
        float *data_in_l_pool = (float*)inputs[2];
        float *data_out_b = (float*)outputs[0];
        float *data_out_s = (float*)outputs[1];

        boxDecoderLauncher(batch_size, map_size, thr, data_in_l, data_in_b, data_in_l_pool, data_out_b, data_out_s, stream);

        return 0;
    }

    size_t getSerializationSize() const { return sizeof(float) + 3 * sizeof(int); }

    void serialize(void* buffer) const
    {
        char *d = reinterpret_cast<char*>(buffer);
        write(d, scoreThreshold);
        write(d, mapHeight);
        write(d, mapWidth);
        write(d, mapDepth);
    }

    void configureWithFormat(const Dims* inputs, int nbInputs, const Dims* outputDims, int nbOutputs, nvinfer1::DataType type, nvinfer1::PluginFormat format, int maxBatchSize) override
    {
        assert(nbOutputs == 2);
        assert(inputs[0].nbDims == 3);
        assert(inputs[1].nbDims == 3);
        assert(inputs[2].nbDims == 3);
        for (int i = 0; i < nbInputs; ++i)
        {
            assert(inputs[i].d[1] == mapHeight);
            assert(inputs[i].d[2] == mapWidth);
        }
    }

    bool supportsFormat(DataType type, PluginFormat format) const override { return (type == DataType::kFLOAT && format == PluginFormat::kNCHW); }

    const char* getPluginType() const override { return "BoxDecoding_TRT"; }

    const char* getPluginVersion() const override { return "1"; }

    void destroy() override { delete this; }

    IPluginV2* clone() const { return new BoxDecodingLayer(scoreThreshold, mapHeight, mapWidth, mapDepth); }

    void setPluginNamespace(const char* libNamespace) override { mNamespace = libNamespace; }

    const char* getPluginNamespace() const override { return mNamespace.c_str(); }

private:
    template <typename T>
    void write(char*& buffer, const T& val) const
    {
        *reinterpret_cast<T*>(buffer) = val;
        buffer += sizeof(T);
    }

    template <typename T>
    T read(const char*& buffer)
    {
        T val = *reinterpret_cast<const T*>(buffer);
        buffer += sizeof(T);
        return val;
    }

    float scoreThreshold;
    int mapHeight;    
    int mapWidth;
    int mapDepth;
    std::string mNamespace;
};

namespace
{
const char* BOXDECODINGLAYER_PLUGIN_VERSION{"1"};
const char* BOXDECODINGLAYER_PLUGIN_NAME{"BoxDecoding_TRT"};
} // namespace


class BoxDecodingLayerPluginCreator : public IPluginCreator
{
public:

    BoxDecodingLayerPluginCreator()
    {
        mPluginAttributes.emplace_back(PluginField("score_threshold", nullptr, PluginFieldType::kFLOAT32, 1));
        mPluginAttributes.emplace_back(PluginField("map_height", nullptr, PluginFieldType::kINT32, 1));
        mPluginAttributes.emplace_back(PluginField("map_width", nullptr, PluginFieldType::kINT32, 1));
        mPluginAttributes.emplace_back(PluginField("map_depth", nullptr, PluginFieldType::kINT32, 1));

        mFC.nbFields = mPluginAttributes.size();
        mFC.fields = mPluginAttributes.data();
    }

    ~BoxDecodingLayerPluginCreator() {}

    const char* getPluginName() const override { return BOXDECODINGLAYER_PLUGIN_NAME; }

    const char* getPluginVersion() const override { return BOXDECODINGLAYER_PLUGIN_VERSION; }

    const PluginFieldCollection* getFieldNames() override { return &mFC; }

    IPluginV2* createPlugin(const char* name, const PluginFieldCollection* fc) override
    {
        const PluginField* fields = fc->fields;

        for (int i = 0; i < fc->nbFields; ++i)
        {
            const char* attrName = fields[i].name;
            if (!strcmp(attrName, "score_threshold"))
            {
                assert(fields[i].type == PluginFieldType::kFLOAT32);
                scoreThreshold = *(static_cast<const float*>(fields[i].data));
            }
            if (!strcmp(attrName, "map_height"))
            {
                assert(fields[i].type == PluginFieldType::kINT32);
                mapHeight = *(static_cast<const int*>(fields[i].data));
            }
            if (!strcmp(attrName, "map_width"))
            {
                assert(fields[i].type == PluginFieldType::kINT32);
                mapWidth = *(static_cast<const int*>(fields[i].data));
            }
            if (!strcmp(attrName, "map_depth"))
            {
                assert(fields[i].type == PluginFieldType::kINT32);
                mapDepth = *(static_cast<const int*>(fields[i].data));
            }
        }

        return new BoxDecodingLayer(scoreThreshold, mapHeight, mapWidth, mapDepth);
    }

    IPluginV2* deserializePlugin(const char* name, const void* serialData, size_t serialLength) override
    {
        auto plugin = new BoxDecodingLayer(serialData, serialLength);
        mPluginName = name;
        return plugin;
    }

    void setPluginNamespace(const char* libNamespace) override { mNamespace = libNamespace; }

    const char* getPluginNamespace() const override { return mNamespace.c_str(); }

private:
    float scoreThreshold;
    int mapHeight;    
    int mapWidth;
    int mapDepth;
    std::string mNamespace;
    std::string mPluginName;
    std::vector<PluginField> mPluginAttributes;
    PluginFieldCollection mFC;
};

REGISTER_TENSORRT_PLUGIN(BoxDecodingLayerPluginCreator);

SunilJB · May 22, 2020, 6:15am

Hi,

Based on the error it seems that your plugin library is not being registered for some reason.
Could you please check if your plugin is reg properly on jetson nano system?

During runtime, the Plugin Registry can be queried using the extern function getPluginRegistry(). The Plugin Registry stores a pointer to all the registered Plugin Creators and can be used to look up a specific Plugin Creator based on the plugin name and version.

Thanks

fbrughi · June 12, 2020, 9:54am

Hi,
at deserialization time, I query the list of creators that have been registered, with the following code

int numCreators = 0;
nvinfer1::IPluginCreator* const* tmpList = getPluginRegistry()->getPluginCreatorList(&numCreators);
for (int k = 0; k < numCreators; ++k)
{
    if (!tmpList[k])
    {
        std::cout << "Plugin Creator for plugin " << k << " is a nullptr." << std::endl;
        continue;
    }
    std::string pluginName = tmpList[k]->getPluginName();
    std::cout << k << ": " << pluginName << std::endl;
}

the output I get is:

0: RnRes2Br2bBr2c_TRT
1: RnRes2Br2bBr2c_TRT
2: RnRes2Br1Br2c_TRT
3: RnRes2Br1Br2c_TRT
4: CustomSkipLayerNormPluginDynamic
5: CustomEmbLayerNormPluginDynamic
6: CustomGeluPluginDynamic
7: CustomQKVToContextPluginDynamic
8: CustomFCPluginDynamic
9: SingleStepLSTMPlugin
10: PostProcessing_TRT
11: BoxDecoding_TRT
12: Resize_TRT

where number 11, “BoxDecoding_TRT” is the problematic custom plugin; so it looks like it is registered correctly;
yet at run time this error is triggered

#assertionbatchedNMSPlugin.cpp,116
Aborted (core dumped)

being batchedNMSPlugin the layer that follows, I guess that it breaks cause it doesn’t get the output of BoxDecoding_TRT

as I mentioned, both on a RTX2080ti and on a T4, the plugin is registered and deserialized correctly, and everything works; so it looks like a platform specific issue, related to the Jetson Nano

is there any specific step that I have to do, to use a custom plugin on a Jetson Nano?
have you exeprienced any issue in general with custom plugins on Jetson Nano?

thanks again,

f

SunilJB · June 12, 2020, 10:50am

I don’t think there are any specific/different steps to use a custom plugin on a Jetson Nano.

But you can file your issue in Jetson Nano forum to get a proper support:

Thanks