We trained a YOLOv4 MobileNetV2 model using the NVIDIA TLT 3.0 Docker image:
nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3.We selected a .tlt checkpoint as our inference model.We then followed two different deployment paths:
Path 1 (Jetson Xavier):
-
Exported the
.tltto.etlton the TLT 3.0 Docker. -
Generated a TensorRT engine using
trtexecon a Jetson Xavier (edge device).
Path 2 (Jetson Orin):
-
Converted the
.tltto.hdf5on a newer TAO Docker image: -
Exported the
.hdf5to.onnxusing the same TAO Docker. -
Generated a TensorRT engine using
trtexecon a Jetson Orin (new edge device).
For evaluation, we run inference on the edge device using deepstream-app over a deterministic and fixed batch of test images (with annotations). From this we obtain:
-
MP4 videos of the inference results, and
-
A directory of KITTI files containing the bounding boxes for each frame.
On a development machine, we then compute average precision (AP), F1 score, and additional metrics to evaluate the model’s performance for each engine.Let’s call the engines engine_path1 and engine_path2.
Both engines are evaluated using the same process, the only difference is the edge device on which inference is performed.
-
Jetson Xavier with engine_path1:
AP = 0.696960937158613 -
Jetson Orin with engine_path2:
AP = 0.455345910299531
Both engines were built from the same .tlt. We repeated this process for two more .tlt models from different training sessions, and the behavior was similar: Path 2 consistently yields significantly lower AP than Path 1.
Please advise,
Ziv