Discrepancy between results from tlt-infer and trt engine

Lapino · November 20, 2020, 7:14am

I’m doing the inference with yolov3 tensorrt engine converted by tlt-converter, however I found that the inference result from tensorrt engine and that form tlt-infer are different. I think that might due to differences during pre-processing stage. Since I could not get access to the pre-processing part of tlt-infer, I’ve attached below that part for my tensorrt engine:

frame = cv2.imread(img_path)
reso = (416, 416)
ratio_h0, retio_w0 = 416 / frame.shape[0], 416 / frame.shape[1]
frame = cv2.resize(frame, (reso[0], reso[1]))
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # BGR → RGB
mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))
frame = frame.transpose(2, 0, 1).astype(‘float32’) - mean
input0 = torch.as_tensor(frame).unsqueeze(0).to(device_m)

Lapino · November 20, 2020, 8:06am

For example, here are two results of an input image randomly choose in VOC dataset. Above is the result from my trt engine, the lower one from tlt-infer.

Morganh · November 20, 2020, 9:49am

For preprocess of RGB image,
inf_img = np.array(inf_img).astype(np.float32)
inference_input = preprocess_input(inf_img.transpose(2, 0, 1))

Lapino · November 20, 2020, 10:06am

Ok, so to be clear, we only need those two transformations (without other processing like padding, mean extraction etc) to do the inference with trt engine?

Morganh · November 20, 2020, 10:09am

Padding is needed.
Mean extraction is also needed. It is included in preprocess_input of “keras.applications.imagenet_utils”

Lapino · November 20, 2020, 10:25am

All right, could you please describe the complete preprocessing stage for yolov3 trt engine generated from TLT? That would be very helpful.

Morganh · November 23, 2020, 3:19am

For the preprocessing of a RGB image,

do not change aspect_ratio, then resize(img.resize) the original image
create(image.new) an image which corresponds to model input width/height
paste(image.paste) (1) to (2)
inf_img = np.array(inf_img).astype(np.float32)
inference_input = preprocess_input(inf_img.transpose(2, 0, 1))

More, suggest you to run deepsteam inference to check the result firstly.
Make sure deepstream can run your trt engine correctly comparing to tlt-infer.

Lapino · November 23, 2020, 3:43am

Ok, understood, I’ll try with those steps. Thanks a lot for those details and suggestions!

Lapino · November 23, 2020, 4:56am

By the way, to be precise in implementation, the term “do not change aspect-ratio” you mentioned in step 1 imply the padding operation, and could you specify which function or padding pattern is needed, please?

In addition, I also tried with the preprocessing configuration in deepstream_tlt_apps, which works not so well ( https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/master/pgie_yolov3_tlt_config.txt)
In this file, I found only three preprocessing steps and no padding (line 48-50):

// normalization scale is 1
net-scale-factor=1.0
// mean extraction in BGR order
offsets=103.939;116.779;123.68
// 1 refers to channels in BGR order
model-color-format=1

Morganh · November 23, 2020, 6:18am

You can consider step (3) as padding.

Please mask sure you can run deepstream well.
You can try to run default yolo models in GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.

Lapino · November 23, 2020, 6:37am

def letterbox(img, new_shape=(416, 416), color=(128, 128, 128)):
# Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
shape = img.shape[:2]  # current shape [height, width]
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

# Compute padding
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding

dw /= 2  # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad:  # resize
    img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

return img

Here is the code I used for padding, however implementing this padding makes the result even worse. Is that the right way to do? Or, perhaps could you provide an example of code for padding, please?

Morganh · November 23, 2020, 7:43am

Please refer to my steps mentioned and commented above. I think I already describe clearly.
Actually you can use Deepstream to run inference. It is default example for deploying etlt models or trt engine.

Lapino · November 23, 2020, 8:24am

Right, if you agree with my padding process, I think I’ve already implement all those steps you mentioned, still, I could not get the consistent result with tlt.

As for the deepstream validation, I’m afraid it’s not an option for me as I’m working on images validation rather than video stream.

Morganh · November 23, 2020, 8:27am

Deepstream can run inference against images.

./deepstream-custom -c pgie_config_file -i <H264 or JPEG filename> [-b BATCH] [-d]
    -h: print help info
    -c: pgie config file, e.g. pgie_frcnn_tlt_config.txt
    -i: H264 or JPEG input file
    -b: batch size, this will override the value of "baitch-size" in pgie config file
    -d: enable display, otherwise dump to output H264 or JPEG file

Lapino · November 25, 2020, 8:23am

Hi Morganh,
After implementing the preprocessing steps you mentioned before, bounding boxes and scores from trt engine are getting closer to the results of tlt-infer, with some minor differences.

However, I got a lot of wrong label assignments, and I observe that they are exactly assigned to the previous label of the correct one. Here are my list of voc categories and some visualizations:
classes_voc = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", \ "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

Do you have any ideas on what might cause this mislabeling ?

Morganh · November 26, 2020, 9:05am

Can you deploy the same trt engine with deepstream to check if it can be reproduced?

Lapino · December 2, 2020, 1:18pm

I also tried with deepstream, however, I still has problems with label assignment.

Morganh · December 2, 2020, 2:25pm

Can you narrow down you issue via

deploy the etlt model in deepstream
run the default yolo jupyter notebook and then deploy its etlt model or trt engine in deepstream?

Lapino · December 8, 2020, 7:52am

Hi Morganh,
Now I’m trying to do the similar expriment with RetinaNet, to see if I could get consistent results. Therefore, I would like to ask the preprocessing steps for RetinaNet please?

Morganh · December 8, 2020, 7:57am

In TLT the pre-processing of Retinanet is like below:

assume RGB input values in range from 0.0 to 255.0 as float
change from RGB to BGR
then subtract channels of input values by 103.939;116.779;123.68 separately for BGR channels.

Topic		Replies	Views
Issue with image classification tutorial and testing with deepstream-app TAO Toolkit tensorrt , jetson-inference	34	5792	October 12, 2021
How to do inference with a TLT faster rcnn model? TAO Toolkit	15	1696	October 12, 2021
Tlt resnet18 performance drop between .tlt inference and .engine TAO Toolkit	25	2104	October 4, 2021
Different result between tlt-infer and trt engine unet segmentation model TAO Toolkit nvbugs	30	1131	October 12, 2021
Inferring Yolo_v3.trt model in python TAO Toolkit tensorrt	38	3370	October 12, 2021
Interpreting output of MaskRCNN from TLT to TRT TAO Toolkit tensorrt	7	1675	October 9, 2021
Custom TAO unet model classifying only two classes on Deepstream! TAO Toolkit	34	1700	May 12, 2022
Inferring resnet18 classification etlt model with python TAO Toolkit	45	3988	October 12, 2021
NvDsInferLayerInfo not giving expected no. of outputs DeepStream SDK	60	2080	October 12, 2021
Classification inference huge performance degradation TAO Toolkit	11	1527	February 18, 2022

Discrepancy between results from tlt-infer and trt engine

Related topics