Custom plugin get random result

I’m working on implement a custom layer with two inputs and one output. warpped with pybind11.

when I build the engine and do the inference, it not always give me the right output. Some time it would give me a wrong output(1 error every 7~10 build).

I have add some log at the beginning of enqueue, vin(inputs[0]) is the same as what i give the network, but voff(inputs[1]) is different from what i set when I got a wrong output.
PS: voff[0] is always right even when the output is wrong.

const int k_copy_size=5;
    float vin[k_copy_size];
    cudaMemcpy(&vin, (float *)inputs[0], sizeof(float) * k_copy_size, cudaMemcpyDeviceToHost);
    std::cout << "vin:\t";
    for(int i=0;i< k_copy_size; ++i){
        std::cout << vin[i] << "\t|";
    std::cout <<std::endl;

    float voff[k_copy_size];
    cudaMemcpy(&voff, (float *)inputs[1], sizeof(float) * k_copy_size, cudaMemcpyDeviceToHost);
    std::cout << "voff:\t";
    for(int i=0;i< k_copy_size; ++i){
        std::cout << voff[i] << "\t|";
    std::cout <<std::endl;

here is how i build the engine and do the inference in python

with trt.Builder(TRT_LOGGER) as builder, builder.create_builder_config() as config, builder.create_network(EXPLICIT_BATCH) as network:
        input_layer = network.add_input(name="input", dtype=trt.float32, shape=(
            batch_size, in_channels, -1, -1))
        offset_layer = network.add_input(name="offset", dtype=trt.float32, shape=(
            batch_size, offset_channels, -1, -1))

        custom_layer = network.add_plugin_v2(
            inputs=[input_layer, offset_layer], plugin=custom_layer_plugin)

        custom_layer.get_output(0).name = "output"

        print("\n### build engine and inference")
        config.max_workspace_size = max_workspace_size
        profile = builder.create_optimization_profile()
                          (batch_size, in_channels, 8, 8),
                          (batch_size, in_channels, 16, 16),
                          (batch_size, in_channels, 32, 32))
                          (batch_size, offset_channels, 4, 4),
                          (batch_size, offset_channels, 8, 8),
                          (batch_size, offset_channels, 16, 16))
        with builder.build_engine(network, config) as engine, engine.create_execution_context() as context:
            stream = cuda.Stream()

            binding_input_index = engine.get_binding_index("input")
            binding_offset_index = engine.get_binding_index("offset")
            bidding_output_index = engine.get_binding_index("output")
            network_input = [None] * 3

            context.set_binding_shape(binding_input_index, tensor_input.shape)
                binding_offset_index, tensor_offset.shape)

            tensor_output *= 0
            network_input[binding_input_index] = tensor_input.data_ptr()
            network_input[binding_offset_index] = tensor_offset.data_ptr()
            network_input[bidding_output_index] = tensor_output.data_ptr()

            start = time.time()
                network_input, stream_handle=stream.handle)
            end = time.time()


            predict_output = tensor_output.detach().cpu().numpy()

the input and offset comes from pytorch. I also tried the pycuda, same error.
Why do this happen and how do I fix it?

1 Linux distro and version – Ubuntu 16.04
2 GPU type — 2080ti
3 Nvidia driver version - 418.56
4 CUDA version - 10.0
5 CUDNN version - 7.4.2
6 python version – 3.7
7 pytorch version – 1.2.0
8 TensorRT version –

Thank you

update 20/1/15:
I set the log level to VERBOSE and found some log like below

[TensorRT] VERBOSE: Adding reformat layer: (Unnamed Layer* 0) [PluginV2DynamicExt]
reformatted input 1 (offset)
from Float(1,(# 3 (SHAPE offset)),(* (# 2 (SHAPE offset)) (# 3 (SHAPE offset))),(* 18 (* (# 2 (SHAPE offset)) (# 3 (SHAPE offset)))))
to Float(1,(# 3 (SHAPE offset)),(* (# 2 (SHAPE offset)) (# 3 (SHAPE offset))):32,(* (# 2 (SHAPE offset)) (# 3 (SHAPE offset))))

this log only appeared when I get the wrong output.


Could you please share the script and model file so we can help better?


thanks for reply

here is the link

set the TENSORRT_ROOT in the CMakeLists.txt and build the project

run the script :python build/lib/


this is not the same code I talked about, but has the same error.

I remove the pybind11 code and use ctypes here, because i found pybind11 has more chance to tragger the wrong output.

Hi @TJWindows,

The error was due to the code, your plugin implementation only supported fp32 + linear format, so you should reject other format combinations. The fixed code look like this:

bool GroupNormPluginDynamic::supportsFormatCombination(int pos, const nvinfer1::PluginTensorDesc *inOut, int nbInputs, int nbOutputs)
    assert(0 <= pos && pos < 2);
    const auto *in = inOut;
    const auto *out = inOut + nbInputs;
    switch (pos)
    case 0:
        return in[0].type == nvinfer1::DataType::kFLOAT && in[0].format == nvinfer1::TensorFormat::kLINEAR;
    case 1:
        return out[0].type == in[0].type &&
               out[0].format == nvinfer1::TensorFormat::kLINEAR;

@SunilJB I see, thank you.