Problem with custom layers and Python UFF parser in TensorRT 3.0 RC

I am using TensorRT 3.0 RC via the Python API. I tried implementing a Custom Layer, similar to the example in tensorrt.examples.custom_layers. After some work, I realized that it’s not possible to use a Custom Layer with the UFF Parser, since the set_plugin_factory() function is not implemented for the UFF Parser. In particular, I get the following error:

plugin_factory = tensorrtplugins.ECPluginFactory()
trt_engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, 1, 1<<20, plugin_factory=plugin_factory)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/a/.conda/envs/trt2.7/lib/python2.7/site-packages/tensorrt/utils/", line 115, in uff_to_trt_engine
AttributeError: 'nv_uff_parser_bindings.UffParser' object has no attribute 'set_plugin_factory'

Is there some simple workaround for this and could you please fix this in future versions?

Edit: The setPluginFactory() function seems to be missing for the UFF Parser even in the C++ API.


Could you share which layer do you want to implement?
The plugin is the implementation for inferencing, not for parsing.

Please convert the TensorFlow model to UFF first. (Ex. removing the unsupported layer)
Then launch TensorRT with the custom plugin implementation.


I did convert the Tensorflow model to UFF first using

uff_model = uff.from_tensorflow_frozen_model('model.pb', ["fc2/Relu"])

which worked fine, with the unsupported layer being converted as a custom op:

Warning: No conversion function registered for layer: Elu yet.
Converting as custom op Elu _ELU

I then create a UFF Parser as in the User Guide Section

parser = uffparser.create_uff_parser()
parser.register_input("Placeholder", (1,28,28), 0)

which worked too.

The problem is in creating the inference engine from the UFF model. As I show in my first post, I tried supplying the plugin_factory to the tensorrt.utils.uff_to_trt_engine() function, which should be possible according to the documentation, but I get the error instead. This occurs even if I use the FullyConnectedPluginFactory() shipped with TensorRT.

The problem seems to be that the UffParser object’s set_plugin_function() hasn’t been implemented yet in TensorRT 3.0 RC. It seems to be implemented only for the CaffeParser.

Is there perhaps a simple workaround for this? I’d like to implement other custom layers than Elus, but at the moment it seems that it’s impossible with Tensorflow/UFF.


Do you create the plugin from this API?

/usr/local/lib/python2.7/dist-packages/tensorrt/utils/_utils.pyc in uff_to_trt_engine(logger, stream, parser, max_batch_size, max_workspace_size, datatype, <b>plugin_factory</b>, calibrator)


I create it from tensorrt.utils.uff_to_trt_engine and supply the plugin_factory as an argument, as shown in my first post.

Here is a minimal example exhibiting the problem:

import tensorrt
import tensorrtplugins
from tensorrt.parsers import uffparser

plugin_factory = tensorrtplugins.FullyConnectedPluginFactory()
parser = uffparser.create_uff_parser()

This gives the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'nv_uff_parser_bindings.UffParser' object has no attribute 'set_plugin_factory'

And as shown in the error from my first post, tensorrt.utils.uff_to_trt_engine() expects the Uff Parser to have a set_plugin_factory() function.

It works for the Caffe Parser:

import tensorrt
import tensorrtplugins
from tensorrt.parsers import caffeparser

plugin_factory = tensorrtplugins.FullyConnectedPluginFactory()
parser = caffeparser.create_caffe_parser()

Again, I would be grateful for any workaround for this problem.


We are checking this issue internally.
Will update information with you later.



Sorry for the late.

UFF parser doesn’t support plugin factory currently.
We have feedbacked this request, and it will be prioritized internally.

Currently, please create a TensorRT engine via C++ API, which can support this use case.
Thanks and sorry for the inconvenience.


It looks like NvUffParser does not have IPluginFactory implemented in the C++ API.

Does this mean only the Caffe parser support plugins currently?

All the plugins examples found are using Caffe models. If Uff can be used with plugins, could you please share an example with me?



Plugin API is only available for Caffe parser.
Thanks and sorry for any inconvenience.


Still we do not have Plugin API for UFF Parser?


I just bumped into the very same limitation. Hopefully it’ll get fixed soon.

Also, it would be very useful to be able to pass parameters from the UFF model definition on to the plugin constructor.


UFF parser doesn’t support plugin API currently.

We are collecting user experience of TensorRT.
It’s recommended to take 5-minutes to fill out this survey to help in making TensorRT an amazing product.


(TRT 3)Just to be sure, for a UFF based parser(Tensorflow model), I can remove unsupported layers(non weight-contained layer) and generate a .uff model from Python API. I can then use the C++ API to add the Custom Layer Plugin during inference time ?


Uff parser support custom layer from TensorRT 4.0.

TensorRT 4.0 is available for desktop GPU user.

For Jetson user, please wait for our next JetPack release.


Is there a samplePlugin equivalent for using custom layer plugins with UFF, I want to try it on desktop GPU with TRT 4RC


Similar to Caffe interface, you can set plugin factory to uff parser directly:



Yes, I used the above interface, but I am getting wrong predictions. Also, changing the name of layer has to a non-existent one in the model still produces output

bool isPlugin(const char* name) override
		return !strcmp(name, "fc2/add");

Here is the complete code I used.

#include <assert.h>
#include <chrono>
#include <fstream>
#include <sstream>
#include <iostream>
#include <cmath>
#include <sys/stat.h>
#include <cmath>
#include <time.h>
#include <cuda_runtime_api.h>
#include <cudnn.h>
#include <cublas_v2.h>
#include <memory>
#include <string.h>

#include "NvInfer.h"
#include "NvUffParser.h"
#include "common.h"

using namespace nvuffparser;
using namespace nvinfer1;
// stuff we know about the network and the caffe input/output blobs
static const int INPUT_H = 28;
static const int INPUT_W = 28;
static const int OUTPUT_SIZE = 10;
static Logger gLogger;

#define MAX_WORKSPACE (1 << 30)

const char* INPUT_BLOB_NAME = "Input/input";
const char* OUTPUT_BLOB_NAME = "fc2/Relu";

std::string locateFile(const std::string& input)
    std::vector<std::string> dirs{"data/samples/mnist/", "data/mnist/"};
    return locateFile(input, dirs);

// simple PGM (portable greyscale map) reader
void readPGMFile(const std::string& filename,  uint8_t buffer[INPUT_H*INPUT_W])
    readPGMFile(locateFile(filename), buffer, INPUT_H, INPUT_W);

void UFFtoGIEModel(const char* uffFile, int maxBatchSize, nvuffparser::IPluginFactory* pluginFactory, IHostMemory *&gieModelStream)
    IBuilder* builder = createInferBuilder(gLogger);
    INetworkDefinition* network = builder->createNetwork();
    IUffParser* parser = createUffParser();
    parser->registerInput(INPUT_BLOB_NAME, DimsCHW(1, 28, 28));
    parser->parse(uffFile, *network, nvinfer1::DataType::kFLOAT);

    // specify which tensors are outputs
	/* we create the engine */
    builder->setMaxWorkspaceSize(1 << 20);

    ICudaEngine* engine = builder->buildCudaEngine(*network);

    // serialize the engine, then close everything down
	// serialize the engine, then close everything down
	gieModelStream = engine->serialize();

void doInference(IExecutionContext& context, float* input, float* output, int batchSize)
	const ICudaEngine& engine = context.getEngine();
	// input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(),
	// of these, but in this case we know that there is exactly one input and one output.
	assert(engine.getNbBindings() == 2);
	void* buffers[2];

	// In order to bind the buffers, we need to know the names of the input and output tensors.
	// note that indices are guaranteed to be less than IEngine::getNbBindings()
	int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME),
		outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);

	// create GPU buffers and a stream
	CHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float)));
	CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));

	cudaStream_t stream;

	// DMA the input to the GPU,  execute the batch asynchronously, and DMA it back:
	CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
	context.enqueue(batchSize, buffers, stream, nullptr);
	CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE*sizeof(float), cudaMemcpyDeviceToHost, stream));

	// release the stream and the buffers

class FCPlugin: public IPlugin
	FCPlugin(const Weights *weights, int nbWeights, int nbOutputChannels): mNbOutputChannels(nbOutputChannels)
		// since we want to deal with the case where there is no bias, we can't infer
		// the number of channels from the bias weights.

		assert(nbWeights == 2);
		mKernelWeights = copyToDevice(weights[0].values, weights[0].count);
		mBiasWeights = copyToDevice(weights[1].values, weights[1].count);
		assert(mBiasWeights.count == 0 || mBiasWeights.count == nbOutputChannels);

		mNbInputChannels = int(weights[0].count / nbOutputChannels);

	// create the plugin at runtime from a byte stream
	FCPlugin(const void* data, size_t length)
		const char* d = reinterpret_cast<const char*>(data), *a = d;
		mNbInputChannels = read<int>(d);
		mNbOutputChannels = read<int>(d);
		int biasCount = read<int>(d);

		mKernelWeights = deserializeToDevice(d, mNbInputChannels * mNbOutputChannels);
		mBiasWeights = deserializeToDevice(d, biasCount);
		assert(d == a + length);


	int getNbOutputs() const override
		return 1;

	Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) override
		assert(index == 0 && nbInputDims == 1 && inputs[0].nbDims == 3);
		assert(mNbInputChannels == inputs[0].d[0] * inputs[0].d[1] * inputs[0].d[2]);
		return DimsCHW(mNbOutputChannels, 1, 1);

	void configure(const Dims* inputDims, int nbInputs, const Dims* outputDims, int nbOutputs, int maxBatchSize) override

	int initialize() override
		CHECK(cudnnCreate(&mCudnn));							// initialize cudnn and cublas
		CHECK(cudnnCreateTensorDescriptor(&mSrcDescriptor));	// create cudnn tensor descriptors we need for bias addition

		return 0;

	virtual void terminate() override

	virtual size_t getWorkspaceSize(int maxBatchSize) const override
		return 0;

	virtual int enqueue(int batchSize, const void*const * inputs, void** outputs, void* workspace, cudaStream_t stream) override
		float kONE = 1.0f, kZERO = 0.0f;
		cublasSetStream(mCublas, stream);
		cudnnSetStream(mCudnn, stream);
		CHECK(cublasSgemm(mCublas, CUBLAS_OP_T, CUBLAS_OP_N, mNbOutputChannels, batchSize, mNbInputChannels, &kONE,
				reinterpret_cast<const float*>(mKernelWeights.values), mNbInputChannels,
				reinterpret_cast<const float*>(inputs[0]), mNbInputChannels, &kZERO,
				reinterpret_cast<float*>(outputs[0]), mNbOutputChannels));
		if (mBiasWeights.count)
			CHECK(cudnnSetTensor4dDescriptor(mSrcDescriptor, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, 1, mNbOutputChannels, 1, 1));
			CHECK(cudnnSetTensor4dDescriptor(mDstDescriptor, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, batchSize, mNbOutputChannels, 1, 1));
			CHECK(cudnnAddTensor(mCudnn, &kONE, mSrcDescriptor, mBiasWeights.values, &kONE, mDstDescriptor, outputs[0]));
		return 0;

	virtual size_t getSerializationSize() override
		// 3 integers (number of input channels, number of output channels, bias size), and then the weights:
		return sizeof(int)*3 + mKernelWeights.count*sizeof(float) + mBiasWeights.count*sizeof(float);

	virtual void serialize(void* buffer) override
		char* d = reinterpret_cast<char*>(buffer), *a = d;

		write(d, mNbInputChannels);
		write(d, mNbOutputChannels);
		write(d, (int)mBiasWeights.count);
		serializeFromDevice(d, mKernelWeights);
		serializeFromDevice(d, mBiasWeights);

		assert(d == a + getSerializationSize());
	template<typename T> void write(char*& buffer, const T& val)
		*reinterpret_cast<T*>(buffer) = val;
		buffer += sizeof(T);

	template<typename T> T read(const char*& buffer)
		T val = *reinterpret_cast<const T*>(buffer);
		buffer += sizeof(T);
		return val;

	Weights copyToDevice(const void* hostData, size_t count)
		void* deviceData;
		CHECK(cudaMalloc(&deviceData, count * sizeof(float)));
		CHECK(cudaMemcpy(deviceData, hostData, count * sizeof(float), cudaMemcpyHostToDevice));
		return Weights{ DataType::kFLOAT, deviceData, int64_t(count) };

	void serializeFromDevice(char*& hostBuffer, Weights deviceWeights)
		cudaMemcpy(hostBuffer, deviceWeights.values, deviceWeights.count * sizeof(float), cudaMemcpyDeviceToHost);
		hostBuffer += deviceWeights.count * sizeof(float);

	Weights deserializeToDevice(const char*& hostBuffer, size_t count)
		Weights w = copyToDevice(hostBuffer, count);
		hostBuffer += count * sizeof(float);
		return w;

	int mNbOutputChannels, mNbInputChannels;
	cudnnHandle_t mCudnn;
	cublasHandle_t mCublas;
	Weights mKernelWeights, mBiasWeights;
	cudnnTensorDescriptor_t mSrcDescriptor, mDstDescriptor;

// integration for serialization
class PluginFactory : public nvinfer1::IPluginFactory, public nvuffparser::IPluginFactory
	// caffe parser plugin implementation
	bool isPlugin(const char* name) override
		return !strcmp(name, "fc2/add");
        std::cout << "Here";

	virtual nvinfer1::IPlugin* createPlugin(const char* layerName, const nvinfer1::Weights* weights, int nbWeights, const FieldCollection fc) override
		// there's no way to pass parameters through from the model definition, so we have to define it here explicitly
		static const int NB_OUTPUT_CHANNELS = 10;
		assert(isPlugin(layerName) && nbWeights == 2 && weights[0].type == DataType::kFLOAT && weights[1].type == DataType::kFLOAT);
		assert(mPlugin.get() == nullptr);
		mPlugin = std::unique_ptr<FCPlugin>(new FCPlugin(weights, nbWeights, NB_OUTPUT_CHANNELS));
		return mPlugin.get();

	// deserialization plugin implementation
	IPlugin* createPlugin(const char* layerName, const void* serialData, size_t serialLength) override
		assert(mPlugin.get() == nullptr);
		mPlugin = std::unique_ptr<FCPlugin>(new FCPlugin(serialData, serialLength));
		return mPlugin.get();

	// the application has to destroy the plugin when it knows it's safe to do so
	void destroyPlugin()

	std::unique_ptr<FCPlugin> mPlugin{ nullptr };

int main(int argc, char** argv)
  auto fileName = locateFile("model.uff");
  std::cout << fileName << std::endl;
  // int maxBatchSize = 1;

	// create a GIE model from the caffe model and serialize it to a stream
	PluginFactory pluginFactory;
	IHostMemory *gieModelStream{ nullptr };
	// caffeToGIEModel("mnist.prototxt", "mnist.caffemodel", std::vector < std::string > { OUTPUT_BLOB_NAME }, 1, &pluginFactory, gieModelStream);

    UFFtoGIEModel(fileName.c_str(),  1,  &pluginFactory,  gieModelStream);

	// read a random digit file
	uint8_t fileData[INPUT_H*INPUT_W];
    int num{rand()%10};
	readPGMFile(std::to_string(num) + ".pgm", fileData);
    std::cout << "Input file num: " << num << std::endl;

	IRuntime* runtime = createInferRuntime(gLogger);
	ICudaEngine* engine = runtime->deserializeCudaEngine(gieModelStream->data(), gieModelStream->size(), &pluginFactory);
	IExecutionContext *context = engine->createExecutionContext();

	// run inference
	float prob[OUTPUT_SIZE];
	doInference(*context, (float*)fileData, prob, 1);
  // destroy the engine

  int maxIdx = 0;
   for (int i = 0; i < 10; ++i)
       if (prob[i] > prob[maxIdx])
           maxIdx = i;
std::cout << std::endl << maxIdx <<std::endl;

  // print a histogram of the output distribution
	std::cout << "\n\n";
    bool pass{false};
	for (int i = 0; i < 10; i++)
        int res = std::floor(prob[i] * 10 + 0.5);
        std::cout << prob[i] <<std::endl;
        // std::cout << "Res: " << res << std::endl;
        std::cout << "i: " <<i << std::endl;
        if (res == 10 && i == num) {pass = true;
		std::cout << i << ": " << std::string(res, '*') << "\n";}
	std::cout << std::endl;



Usually, wrong prediction is caused by incorrect plugin implementation.
Could you verify your plugin layer first?