Wrong results when I convert Movenet models from ONNX to TensorRT

oscaraen · June 8, 2023, 3:09pm

Description

Hello, I’m trying to run Movenet models (thunder and lighting) in TensorRT. I’ve converted the model from TensorFlow saved model to ONNX, and then, from ONNX to TensorRT (using trtexec) in 2 platforms, Jetson Nano (JetPack 4.6), and a Windows machine. I’ve successfully converted the model in both platforms, and then, loaded the engine in a TensorRT (python api) script.

The Issue is: when I run the inferences with the TensorRT model, the outputs are wrong, I mean, output confidence values are lower than 0.007 and the detected keypoints are in wrong positions when I draw them in the image.

To check the entire process (preprocessing, model inferences and drawing the inferences in the image) I’ve developed 2 scripts, the first one performs the inference in the original tensorflow model (reading the .pb), that worked well, scores are above 0.8 and keypoints are in correct positions in the image.
The second script performs the same pipeline, but with ONNX library in python, loading the converted ONNX model, this one also works well (scores above 0.8 for keypoints, and keypoints are correct in the image). I don’t know why only the TensorRT model is doing wrong in both machines and, in both apis (python and C++ api)

To check if was a data type problem, I’ve developed the same pipeline using TensorRT in C++ api , I can load the model, show some properties (binding sizes, number of bindings) all seems good, but, again, in both machines the C++ inferences are wrong (very low scores and the points are drawn wrong).

Environment

TensorRT Version: 8.2.1.8
GPU Type: Jetson Nano tegra210, and RTX3070ti
Nvidia Driver Version: 516.94 in windows
CUDA Version: 10.2.300 in jetson, 11.2 in windows
CUDNN Version: 8.2.1.8 in both machines
Operating System + Version: Windows 10, Ubuntu 18.04
Python Version (if applicable): python 3.9(Windows) and python 3.6 (Jetson)

Relevant Files

Model (tf version , onnx and my windows machine TRT version)

movenetThunder.onnx (24.0 MB)
movenetThunder.zip (23.2 MB)
movenetThunderX86_64Desktop.plan (13.6 MB)

Code to run inferences in ONNX and tensorRT
onnx_thunder_inference.py (977 Bytes)
thunder_trt_script.py (1.7 KB)
helper function to handle trt model
trtClasses.py (2.6 KB)

Steps To Reproduce

converting the tensorflow model with tf2trt:
$ python -m tf2onnx.convert opset 15 --saved_model (path) --output movenetThunder.onnx
and then converting the ONNX model with trtexec with:
$ trtexec.exe --onnx=model.onnx --saveEngine=output.plan

How can I solve this problem? Thanks in advance!

spolisetty · June 9, 2023, 11:15am

Hi,

We have not observed the accuracy difference in the latest TensorRT version 8.6. We recommend you to use the latest TensorRT version.

[I]             Relative Difference | Stats: mean=4.7723e-06, std-dev=8.9374e-06, var=7.9876e-11, median=2.1638e-07, min=0 at (0, 0, 4, 0), max=3.5956e-05 at (0, 0, 2, 2), avg-magnitude=4.7723e-06
[V]                 ---- Values ----
                        [[[[2.08395463e-07 1.18936995e-07 1.61246335e-05]
                           [9.05597574e-07 5.52902634e-07 3.36858575e-05]
                           [1.58152272e-06 4.82829080e-07 3.59561673e-05]
                           [9.87531124e-08 2.13084476e-07 1.39729559e-06]
                           [0.00000000e+00 1.07284059e-07 9.63258026e-06]
                           [4.51258842e-07 2.10348830e-07 2.76756487e-06]
                           [2.16377558e-07 8.28881710e-08 2.39906367e-05]
                           [8.44680415e-08 1.02236548e-07 1.91727868e-05]
                           [3.38106105e-07 4.19999793e-07 2.86271788e-05]
                           [6.74231444e-08 0.00000000e+00 3.32935338e-06]
                           [7.45396349e-08 1.88486155e-07 8.64773483e-06]
                           [0.00000000e+00 1.15276315e-07 5.81911809e-06]
                           [1.03323472e-07 2.58665040e-07 9.27041583e-06]
                           [9.35252586e-08 9.81136310e-08 4.58219802e-06]
                           [0.00000000e+00 1.91411814e-07 1.79891667e-05]
                           [0.00000000e+00 1.06595465e-07 4.31459921e-06]
                           [6.87980801e-08 1.05403942e-07 1.04341289e-05]]]]
[V]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (0       , 3.6e-06 ) |         37 | ########################################
                    (3.6e-06 , 7.19e-06) |          3 | ###
                    (7.19e-06, 1.08e-05) |          4 | ####
                    (1.08e-05, 1.44e-05) |          0 |
                    (1.44e-05, 1.8e-05 ) |          1 | #
                    (1.8e-05 , 2.16e-05) |          2 | ##
                    (2.16e-05, 2.52e-05) |          1 | #
                    (2.52e-05, 2.88e-05) |          1 | #
                    (2.88e-05, 3.24e-05) |          0 |
                    (3.24e-05, 3.6e-05 ) |          2 | ##
[I]         PASSED | Output: 'output_0' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['output_0']
[I] Accuracy Summary | trt-runner-N0-06/09/23-11:04:39 vs. onnxrt-runner-N0-06/09/23-11:04:39 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 95.499s | Command: /usr/local/bin/polygraphy run movenetThunder.onnx --trt --onnxrt --workspace=20G --verbose

Thank you.

oscaraen · June 9, 2023, 3:37pm

I’ve updated mi Windows machine with TensorRT 8.6.1.6, also I’ve converted the model again with trtexec (8.6.1.6) but I’m still watching the issue. I mean, also in your answer the confidence is very small (third column), thus the points are going to be wrong. This is an inference over an image with trt 8.6.1.6:
[0.125852,0.688647,0.0288986,]
[0.122909,0.728908,0.036514,]
[0.11892,0.694355,0.0257515,]
[0.115888,0.739533,0.038808,]
[0.125717,0.663324,0.0263936,]
[0.124831,0.696173,0.0333687,]
[0.143203,0.638394,0.0330626,]
[0.188329,0.672103,0.0321696,]
[0.230244,0.641809,0.0252251,]
[0.225168,0.694879,0.0366927,]
[0.226246,0.696824,0.0199303,]
[0.220067,0.650417,0.0466373,]
[0.236858,0.602306,0.0586232,]
[0.206621,0.608331,0.074691,]
[0.233311,0.616974,0.0466939,]
[0.229971,0.595934,0.0391641,]
[0.230496,0.60848,0.0411795,]

wrong1

Also, Is there a way to keep using TensorRT 8.2.1? I’m triying to deploy in Jeston devices with Jetpack 4.6, I think I cannot update cuda from 10.2.

This is the result on the same model in ONNX, filtering inferences with confidence lower than 0.01:
[0.2121701 0.5908465 0.66256255]
[0.1769617 0.6184405 0.8438019]
[0.1736151 0.5571672 0.64806515]
[0.17624334 0.6419107 0.44044065]
[0.1765048 0.49498025 0.8007731 ]
[0.35050184 0.7078541 0.89948887]
[0.33293843 0.40762717 0.8267948 ]
[0.57030946 0.822819 0.76486623]
[0.54338384 0.32539034 0.7200603 ]
[0.51787597 0.8769932 0.7091793 ]
[0.7186349 0.20520929 0.54700714]
[0.83332926 0.64797986 0.78879374]
[0.83464724 0.45133153 0.7731846 ]
[0.9976017 0.6606558 0.02581054]
[0.9836272 0.4453248 0.03359169]
[1.0159359 0.64814013 0.00775313]
[1.0501901 0.5519477 0.00633091]

oscaraen · June 27, 2023, 1:54pm

I would appreciate any help, thanks

spolisetty · September 28, 2023, 10:17am

Hi,

Apologies for the delayed response.
Could you please share the “foto.jpg” to try reproducing this issue for better debugging.