Faster R-cnn Tensorrt results diferences between TLT 3.0 and TensorRT C++


We train a Faster Rcnn model with resnet18 backbone on TLT 3.0 container, train and evaluate test, inference works perfect with int8 calibration. Here TLT config:

faster_rcnn_config.txt (4.3 KB)

We export model to an .etlt file, we call our output tensor NMS -o option. After this we export model with TLT-converter tool on Jetson NX with calibration options and sizes. Seems ok. We are using Tensor RT on a c++ environment to make inference. Tensor input/outputs sizes are:

0 - input_image
     · 0 - 3
     · 0 - 1080
     · 0 - 1920
1 - NMS
     · 0 - 1
     · 0 - 100
     · 0 - 7
2 - NMS_1
     · 0 - 1
     · 0 - 1
     · 0 - 1

At this time, detections seems to be less than on TLT container inference, we think that its problem of image preprocess before pass to the input tensor. We have a Opencv Mat as RGB source image (image).
As result we have 20-30% detections in coparision with tlt tests. As you can see reverse RGB order to BGR, substract channel mean and divide by 1.0 as tlt documentation says on input_image_config parameter specification.

float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer("input_image"));
float pixelMean[3]{103.939,116.779, 123.68};

for (int i = 0, volImg = C * H * W; i < 1; ++i){
     for (int c = 0; c < C; ++c){
         for (unsigned j = 0, volChl = H * W; j < volChl; ++j){
              hostDataBuffer[i * volImg + c * volChl + j] = float(((float([j * C + 2- c])) - pixelMean[c]))/1.0F;

bool status{true};
status = context->execute(1, buffers.getDeviceBindings().data());
const float* nms = static_cast<const float*>(buffers.getHostBuffer("NMS"));
for (int det_id = 0; det_id <100; det_id++){
      float x1 =  nms[det_id * 7 + 3];

Can anyone help?

TensorRT Version: 4.
Jetpack: 32.6 4.6
Cuda: 10.2
cuDNN: 8.2.1
TensorRT: 7.2
Operating System + Version: Ubuntu 18 + Jetpack


Looks like you’re using a very old version of the TensorRT. We recommend you please use the latest version of the TensorRT 8.4.3.

Thank you.

Ok thanks for your recommendation!, but we have some constrains arround version, please, can you help me to check if image pre process its correct? Why older tensorRT version may impact on detections if TLT environment was ok? Why Mask-RCNN works ok, and Faster RCNN no?

Sorry, but we are stoped by this, only need to know if inputs are ok, and if its a know problem. We try training a FASTER RCNN with RESNET 50 backbone but seems the same. Mask RCNN Resnet 50 on TLT 3.0 works ok.


Can you help me? Im a little desesperate about this. If network output use NMS pluggin seems no results. I try with Yolo too, and bad results.


Sorry for the delayed response, could you please provide us issue repro script and onnx model to try from our end for better debugging.
Also, are you able to reproduce this issue on the latest TensorRT version ?

Thank you.