Poor detection results for converted engine from NVIDIA TLT 3.0 to tao-toolkit 5.5.0

We trained a YOLOv4 MobileNetV2 model using the NVIDIA TLT 3.0 Docker image:
nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3.We selected a .tlt checkpoint as our inference model.We then followed two different deployment paths:
Path 1 (Jetson Xavier):

  1. Exported the .tlt to .etlt on the TLT 3.0 Docker.

  2. Generated a TensorRT engine using trtexec on a Jetson Xavier (edge device).

Path 2 (Jetson Orin):

  1. Converted the .tlt to .hdf5 on a newer TAO Docker image:

  2. nvcr.io/nvidia/tao/tao-toolkit:5.5.0-deploy.

  3. Exported the .hdf5 to .onnx using the same TAO Docker.

  4. Generated a TensorRT engine using trtexec on a Jetson Orin (new edge device).

For evaluation, we run inference on the edge device using deepstream-app over a deterministic and fixed batch of test images (with annotations). From this we obtain:

  • MP4 videos of the inference results, and

  • A directory of KITTI files containing the bounding boxes for each frame.

On a development machine, we then compute average precision (AP), F1 score, and additional metrics to evaluate the model’s performance for each engine.Let’s call the engines engine_path1 and engine_path2.

Both engines are evaluated using the same process, the only difference is the edge device on which inference is performed.

  • Jetson Xavier with engine_path1: AP = 0.696960937158613

  • Jetson Orin with engine_path2: AP = 0.455345910299531

Both engines were built from the same .tlt. We repeated this process for two more .tlt models from different training sessions, and the behavior was similar: Path 2 consistently yields significantly lower AP than Path 1.

Please advise,

Ziv

3 Likes

Moving to TAO forum.

To narrow down, if you have a .etlt file from TAO 3, you can first convert the .etlt file to .onnx using the method described here: tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub

Then compare the onnx files to check if it is the same.

To check input/output/etc

polygraphy inspect model model_a.onnx --show layers,attrs > a.txt
polygraphy inspect model model_b.onnx --show layers,attrs > b.txt
diff a.txt b.txt

To check the diff for each layer

polygraphy run model_a.onnx model_b.onnx \
    --onnxrt --onnxrt \
    --onnx-outputs mark all \
    --val-range [0,1] \
    --atol 1e-3 --rtol 1e-3

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.