How to do inference with a TLT faster rcnn model?

Hello everyone,

I have trained a frcnn_resnet18 model with transfer learning toolkit using the docker downloaded from NGC on my host machine.

I can do inference with the deepstream custom app given in the IVA Getting Started Guide, it’s seems to work well on the nano.

My objective now is to run inference with only tensorrt, for this I use the tensorrt sample wich works well with faster rcnn models trained with tensorflow or caffe (and optimized for tensorrt with a uff parser).

The inference of a SSD model trained with TLT and converted to a TRT engine is succefully executed, but it’s not working for a Faster RCNN model: The inference is running but the outputs of the network are weird, the position of the bouding boxes are always between 1 and 3.

Is the post-processing of a Faster RCNN model trained with TLT differents?

Deepstream custom app: GitHub - NVIDIA-AI-IOT/deepstream_4.x_apps: deepstream 4.x samples to deploy TLT training models

Used Tensorrt sample:

Hi Steventel,
Is there any log for “The inference is running but the outputs of the network are weird”?
For “the position of the bouding boxes are always between 1 and 3.”, what do you mean by “1” and “3”? Do you refer to class id?

const std::string CLASSES[OUTPUT_CLS_SIZE]{"background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike    ", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"};

Hi Morganh,

The problem is the position of the bouding boxes, the predicted class seems to be right as the confidence score. When I get the output image, i see little bouding boxes in the left top corner of my image. Even if the positions are wrong, the right of the bouding box is always greater than the left, same fact for the bottom and the top of the bouding boxe, that why i think the problem is about the post processing of the bouding boxes, maybe I’m wrong.

I have printed the output of the network after the post processing (line 349 of the tensorrt sample:

[confidence score] class: [class id]
 [top]  [bottom]
0.999971 classe: 1
0.321713     0     1.50604     1.10493
0.999692 classe: 1
0.807946     0.528997     1.99202     1.64138
0.999317 classe: 1
0.778197     0.560918     1.92445     1.67343
0.999923 classe: 1
0.120525     0.320375     1.2985     1.43041
0.999951 classe: 1
0.340466     0.760247     1.50538     1.87404
0.999848 classe: 1
0.318116     0.635189     1.48855     1.74495
0.999972 classe: 1
0.728438     0.607852     1.88925     1.7232
0.999976 classe: 1
0.624839     0.653931     1.80533     1.76596
0.999318 classe: 1
0.59753     0.0701577     1.78314     1.18366
0.999658 classe: 1
0.664063     0.294367     1.8377     1.406
0.99945 classe: 1
0.874567     0.214179     2     1.33724
0.998457 classe: 1
0.777496     0.441411     1.94824     1.55648


I have forgotten to notify that I’m not able to get the “im_info” binding from a faster rcnn trained with TLT.

Hi steventel,
In TLT the RoI coordinates are (y1, x1, y2, x2), while in Caffe, it is (x1, y1, x2, y2).

Thanks for your answer, but does it make any difference to post-processing?

The main difference for TLT fasterRCNN and Caffe FasterRCNN postprocessing is the RoI coordinates, as mentioned above.

Hi Morganh,

Thanks for your answer. Now we have same results with our own C++ application and with the Deepstream sample.

However, we cannot get the same result as with tlt-infer (even with deepstream). We use the same network with the same image size.

I give you some examples of detected boxes (with a black image, I cannot give you the true image):

The good results with tlt-infer:

The results with deepstream or our C++ application:

Is it normal to not get the same results even with the deepstream sample?

Hi steventel,
Sorry for late reply.We’re investigating the difference.
For the result with deepstream or your C++ application, you were using trt engine to do inference, right?

I summarize you result as below. Please correct me if any.

  1. tlt-infer + tlt model (good result)
  2. deepstream + trt fp16 engine (not good)
  3. your C++ application + trt fp16 engine (not good)
  4. deepstream + etlt model (unknown)

Could you check the result of above item 4? Thanks.

More ideas for the reason of your results.

1.Make sure the visualization confidence threshold, the NMS parameters, etc are the same between tlt-infer and trt inference.

2.The tlt-infer uses fp32 data type. What is the data type in your trt inference? Is it fp16? Lower precision data type can get worse result.

3.What is the rate of detection mismatching? If the rate is high, I’m afraid there is something wrong in the inference code or deepstream configuration.

Hi Morganh,

I summarize my result as below

  1. tlt-infer + tlt model → good result

  2. deepstream + trt fp16 or fp32 engine (generated automaticaly by deepstream from etlt) → not good

  3. my C++ application + trt fp16 or fp32 engine → not good

  4. deepstream + etlt model → not good

  5. I’ve checked that visualization confidence threshold are the same (0.6), same for NMS parameters.

  6. I have tested with fp32 and fp16 models. Got similar bad results.

  7. Yes the rate seems to be high, for this reason, I’ve just send you in private message my tlt training folder with some images, and also the deepstream sample application used.

Thanks for your help

Thanks steventel for the information. It is helpful.
Our internal team is trying to find where is the gap.

Hi steventel,
Could you please update your latest result per our offline syncing? Thanks.

You mentioned issue is gone after you changed to use TLT 1.0.1 docker.

Could you please share your TLT 1.0 training spec and TLT 1.0.1 training spec?

We identified a bug in current release in FRCNN that “pool_size_2x: True” is not properly handled.

We will fix this issue in next release.

Just reminder: the 2.0_dp tlt is released on May 1st.