Run deepstream with Retiface model but got wrong output with different input shape image

Hi support team,

I’m using deepstream-test 3 python example to run RetinaFace resnet50 model.
I get this model from repo: GitHub - biubug6/Pytorch_Retinaface: Retinaface get 80.99% in widerface hard val using mobilenet0.25.
Link download: Retinaface_model_v2 - Google Drive (file Resnet50_Final.pth)

Base on convert_to_onnx.py file in github repo, I converted .pth file to onnx file without error.

To run deepstream, I convert onnx file to .engine file with –explicit _batch param:

/usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=FaceDetector_3x720x1280.onnx --minShapes=input:1x3x720x1280 --optShapes=input:32x3x720x1280 --maxShapes=input:32x3x720x1280 --saveEngine=model.batch1-32-720x1280.engine

I write a custom code for pad probe function after PGIE to do some postprocessing, convert tensorrt model outputs into coordinates of bounding boxes. But the final tensorrt bboxes output seem to unmatch the onnx model output. Below are some first rows of outputs coordinates in format [4 bboxes] [1 confidence]


ONNX input image: 1280x720

-> bboxes, conf: 
(180.20432, 376.52902, 248.42604, 457.669, 0.99970335)
(652.8903, 399.71368, 718.46405, 473.9985, 0.99964297)
(814.8349, 117.745026, 880.6983, 196.94049, 0.99956053)
(124.28804, 516.43823, 187.45502, 590.3951, 0.99950266)
(872.94556, 454.53992, 932.13556, 534.32825, 0.99934095)
(953.01324, 281.43137, 1015.52014, 350.7596, 0.9993309)

TensorRT input image: 1280x720
-> bboxes, conf:
(888.7898, 382.60144, 1010.4221, 428.3359, 0.999685)
(1160.0034, 413.8329, 1277.151, 455.6716, 0.99967617)
(604.57526, 391.67694, 724.69055, 436.8149, 0.99958414)
(168.39978, 138.1584, 285.6008, 182.83948, 0.9995832)
(789.9761, 524.51733, 902.11475, 565.9892, 0.9995227)
(129.55733, 302.13498, 241.45831, 341.36035, 0.99936026)
(414.65784, 480.48514, 519.47003, 525.595, 0.9993399)
...

But when we use the image input with the same width and height (eg. 800x800), the results of onnx and deepstream tensorrt model are quite similar.

ONNX input image: 800x800
-> bboxes, conf: 
(175.73035, 15.69736, 218.86465, 83.27333, 0.99593616)
(406.31818, 446.80933, 450.3378, 519.4465, 0.99531084)
(591.0689, 318.55777, 636.0353, 385.14783, 0.9951623)
(85.07462, 326.9187, 130.29005, 398.35226, 0.9948461)
(573.6182, 3.2033205, 617.547, 66.30011, 0.9943182)
(507.7378, 134.5662, 553.2838, 216.61005, 0.9939587)
...

TensorRT input image: 800x800
(176.44965, 15.834936, 219.82768, 83.319435, 0.99557555)
(406.9334, 448.04874, 450.62445, 519.9374, 0.994955)
(85.84751, 327.6398, 131.38907, 398.96964, 0.9947095)
(591.5082, 319.41702, 636.5945, 386.33548, 0.9946694)
(574.3133, 3.5823674, 618.08594, 66.684364, 0.99455625)
(508.14825, 134.8997, 553.753, 216.69969, 0.99305314)
...

So what is the problem here and how to fix it?
Thank you so much!

Link source code: File on MEGA

Environment

• Hardware Platform: Tesla T4
• DeepStream Version: 5.0
• TensorRT Version: 7.2.1
• PyTorch 1.6
• ONNX v6
• NVIDIA GPU: Driver Version 455.32, CUDA Version 11.1
• OS: Ubuntu 18.04

Hi @SonTV ,
could you clarify what these are?
tensorrt bboxes output ?
onnx model output ?
ONNX input image?
TensorRT input image?

Hi @mchi, sorry for this late response.

Input image of tensorrt and onnx model is the same image: https://github.com/biubug6/Pytorch_Retinaface/blob/master/curve/test.jpg

For onnx model output, I use detect.py file (https://github.com/biubug6/Pytorch_Retinaface/blob/master/detect.py) to load onnx model, draw bounding boxes and save result as an image.

For tensorrt model, I do some steps. I put this model into deepstream pipeline, change width, height config (1280x720 or 800x800), add output-tensor-meta=1 to config file and get raw tensor data from output layer in probe function (after PGIE). In probe function, I make postprocessing (line 104-143 in https://github.com/biubug6/Pytorch_Retinaface/blob/master/detect.py file) to keep final bboxes. When input image is rectangle image (eg. 1280x720), there are many final bboxes output and their coordinates are incorrect. Below is an example output of deepstream pipeline:
https://drive.google.com/file/d/1_WUDfQjjkx-Nm6rTz8YRHz5aDZr_LCxC/view?usp=drivesdk

Hi @mchi,

I also follow instructions in repo: tensorrtx/retinaface at master · wang-xinyu/tensorrtx · GitHub to build retina_r50.engine file from c++ and successfully.

Everything is ok when I run test image with command: ./retina_r50 -d (as mentioned in README file in repo).

But when I use this engine model in deepstream-test3 python example, I was in a trouble. Detail error was described here: https://drive.google.com/file/d/1_XwK7PkyXSH4xJyK_p4vmx0gEHs03JUp/view?usp=sharing

I found the same error in issue section (INVALID_ARGUMENT: getPluginCreator could not find plugin Decode_TRT version 1 · Issue #37 · wang-xinyu/tensorrtx · GitHub) and author of this repo say that: “his repo is not integrated into deepstream, only calling tensorrt api”. Is that true? and if now, how to fix it?

Thanks.

Hi @SonTV ,
Could you try setting “maintain-aspect-ratio=1” in nvinfer ?

I downloaded the source code - 263.59 MB file on MEGA , could you share the detailed instructions about how to run the code in DS docker and reproduce the issue?

Thanks!

Hi @mchi ,

I set param “maintain-aspect-ratio=1” in my config file, but it doesn’t solve the problem.

I run test outside docker environment by this command:

python3 deepstream_test_3.py file:///opt/nvidia/deepstream/deepstream-5.0/sources/deepstream_python_apps/apps/deepstream-test3/video_face_retina_torch.mp4

Link to video that I use to test: https://drive.google.com/file/d/1a0HMxGIEBuwNBFHMQ3z6Mtt27xpCua7V/view?usp=sharing

Anything else need to install to run your sample?

# python3 deepstream_test_3.py file:////opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_python_apps/apps/deepstream-cus/video_face_retina_torch.mp4
Traceback (most recent call last):
  File "deepstream_test_3.py", line 28, in <module>
    import gi
ModuleNotFoundError: No module named 'gi'

Hi @mchi ,

I follow steps in README file here to install libraries: deepstream_python_apps/apps at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub

Hi @SonTV ,
did you install ‘torch’ ?
are you running the sample in DS docker?

# python3 deepstream_test_3.py file:////opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_python_apps/apps/deepstream-cus/video_face_retina_torch.mp4
Traceback (most recent call last):
  File "deepstream_test_3.py", line 47, in <module>
    from custom_parser import nvds_infer_parse_custom_code
  File "/opt/nvidia/deepstream/deepstream-5.1/sources/deepstream_python_apps/apps/deepstream-cus/custom_parser.py", line 3, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

Hi @mchi,

I use Pytorch version 1.6 and I don’t run the sample in DS docker.

Hi @SonTV ,
Here is my findings and suggetions

  1. You are using TRT7.2.1 with DS5.0, but DS5.0 is not compatible with TRT7.2 since DS5.0 was only developed and officially verified with TRT 7.0.
    Frankly, I’m not sure if this issue is caused by the mismnatch of DS and TRT, but user should not use this combination.

  2. I tried to convert your onnx model to TRT engine, it failed with TRT7.0, but success on TRT7.2, to deploy your onnx, you need to use DS5.1 + 7.2.2 (https://developer.nvidia.com/deepstream-getting-started).

  3. I tried to run your code on DS5.1 docker (nvcr.io/nvidia/deepstream:5.0.1-20.09-triton) , but run into some failure about torch.

So, I think the first is we need to use the right combination of DS and TRT, and I would sugges to use DS5.1.

Hi @mchi,

Currently, I cannot update DS5.0 → latest version DS5.1 because some running app requires TRT 7.2.1. I will try it later.

How about my second question (tensorrtx/retinaface project)? Do you have any clue?

Sorry! Could you recap how to run tensorrrx/retinaface ?

Thanks!

Here you are: Run deepstream with Retiface model but got wrong output with different input shape image - #4 by SonTV

Hello, @SonTV I installed all the requirements and ran your code and the code is stuck at Starting pipeline how can I reproduce your error

Hi @tomriddle , What error occured when you run?

@SonTV I am not getting any error its just stuck at Starting pipeline

Hi @SonTV ,
I think you have got this issue solved, right?

Hi @mchi,

Yes, I used retina face model that generated by following instruction this repo: tensorrtx/retinaface at master · wang-xinyu/tensorrtx · GitHub.
Now you can close this topic.
Thanks

1 Like