Tenosrrt: fp32 engine can't get the same output as the pytorch model (monoflex model)

Hi Nvidia,

version info: tensorrt 7.1.0, xavier AGX.

I’m trying to convert a onnx model into a fp32 engine by using trtexec, the inference result of the generated engine is largely different with original model. Additionally, we tried to generate engines with absolutely the same code and model, every time we got a different result, and it seems not the normal float precision issue.
Here is the partial result from pytorch model, with all input was set as 0:


And here is inference result of generated engine, of 2 different engine with absolutely the same code, plugin and model.

so you can see the difference between pytorch result & engine inference & another generated engine.

How to reproduce the issue?
We used a self-defined plugin called DCNv2(deformable convolution), so first you can generate the .so plugin by cmake & make the code I gave in the attachment. after compiling, you will get a libTest_Tensorrt.so.
Then you can run trtexec:

./trtexec --onnx=monoflex.onnx --plugins=libTest_TensorRT.so --workspace=3000 --saveEngine=trt_mono_fp32.engine

ps. some version of tensorrt seems always to search plugin version as “1” instead of “001”, in this case you can change line40 of DCNv2_nvi.cpp from:

	static const char* DCNV2_VERSION{"001"};


	static const char* DCNV2_VERSION{"1"};

Could anyone please explain why it happens and how to make sure our engine infers the correct result in this case?

monoflex onnx: monoflex.onnx - Google Drive
plugin_staff.zip (265.8 KB)


Could you check if you can get a similar result between PyTorch and ONNX model first?
More, do you observe a similar issue on the model without using the plugin layer?


  1. actually no, because onnx-runtime doesn’t support dcnv2 layer. So we can’t check it. But we compared weights and other parameters of every layer, they are all matched.

  2. I tried to replace the the whole dcn (includes the split layer, concat layer, sigmoid layer and dcnv2 layer) into a common conv layer, then this model will always generate the same engine.

Additionally, we marked out the output of concate layer, and print it out, it looks always the same and correct comparing with pytorch result. But we also print out the input(i.e. the “offset” of dcn layer) in our enqueue function of our plugin, sometimes they are not the same, but transposed. It behaves like “channel first” or “channel last”. But it happens randomly!

for example:
concate out:
offset: 0: -9.69022
offset: 1: -16.2194
offset: 2: -16.2194
offset: 3: -16.2194
offset: 4: -16.2194
offset: 5: -16.2194
offset: 6: -16.2194
offset: 7: -16.2194
offset: 8: -16.2194
offset: 9: -16.2194

plugin input:
offset 0 :-9.69022
offset 1 :-5.49843
offset 2 :-2.6923
offset 3 :1.37999
offset 4 :-2.26437
offset 5 :-5.54157
offset 6 :-2.84453
offset 7 :-5.65033
offset 8 :-0.180098
offset 9 :9.86934
offset 10 :-2.38724

and meanwhile, the pytorch result:
first row:
[ -9.6902, -16.2194, -16.2194, -16.2194, -16.2194, -16.2194, -16.2194, -16.2194, -16.2194, -16.2194 …]

first col:

Could you please tell me why? And how could we solve it?


Thanks for your information.

We are checking your plugin implementation internally.
Will get back to you later.



Since TensorRT v8.0 is released, would you mind updating your plugin to be v8.0 compatible?
And check if the same issue occurs in TensorRT v8.0?

More, we cannot find the offset value you indicates.
It seems to be a underlying variable rather than the data passed from enqueue(...).

int DCNv2PluginDynamic::enqueue(const PluginTensorDesc* inputDesc,
        const PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs,
        void* workspace, cudaStream_t stream)
{ ... }

Have you checked if inputs value first?
Or could you tell us how to pass the offset value from inputs?


Hi, after trying a lot of ways to debug, we finally got the reason,
it caused by format setting issue, we modified function supportsFormatCombination as following:

bool DCNv2PluginDynamic::supportsFormatCombination(int pos, const PluginTensorDesc *inOut, int nbInputs, int nbOutputs)

	if (inOut[pos].format != PluginFormat::kLINEAR)
		return false;

	assert(nbInputs == 5);
	assert(nbOutputs == 1);
	assert(0 <= pos && pos <= 5);
	const int posOut = pos - nbInputs;
	return (inOut[pos].type == DataType::kHALF || inOut[pos].type == DataType::kFLOAT) && (inOut[0].type == inOut[pos].type);

it seems that trt will store data in BCHW or BHWC way randomly when didn’t force it as kLINEAR,I think it’s worth being mentioned in the official document.

And we didn’t update into 8.0.