How TAO Toolkit4.0 exported the Tensorrt model and deployed it on Jetson NX

jetpack 5.0.2
docker image nvcr.io/nvidia/l4t-ml:r35.1.0-py3
triton version tritonserver2.27.0-jetpack5.0.2.tgz
I want to deploy with triton on nx, currently convert the tensorrt model on the x86 platform, copy the model to NX to run triton in docker, and triton reports an error when starting, indicating that the Tensorrt version of the model is inconsistent with the use.
Is it the docker image I use incorrectly?Version 4.0 does not provide a conversion tool for the jetson platform, and deploy images cannot be used.

If you run in the docker nvcr.io/nvidia/l4t-ml:r35.1.0-py3, you can download tao-converter in the docker, and then copy .etlt model into the docker, then generate tensorrt engine.

tao-converter v3.22.05_trt8.4_aarch64
I followed the above method and got the following error:

E0214 03:11:38.987317 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::40] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 213, Serialized Engine Version: 232)
E0214 03:11:39.002163 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::49] Error Code 4: Internal Error (Engine deserialization failed.)

At the same time, I have some doubts:
1. The version of TAO Toolkit I use is 4.0, and the document says that tao-converter is no longer supported after 4.0, but tao-deploy is not available on the jetson platform, so if i want to use triton to deploy tao 4.0 model on jetson platform, what is the correct way?
2.The jetpack version, the tensorrt version, the triton version, and the conversion tool version make me feel very confused about whether there is a relatively simple way to build it?
3.If tao-convert still works in version 4.0, how should the above error be modified?

Firstly, may I know that where did you run the inference? In which docker?

Related dockerfile:

From nvcr.io/nvidia/l4t-ml:r35.1.0-py3
COPY tritonserver2.27.0-jetpack5.0.2.tgz .
RUN tar -xzf tritonserver2.27.0-jetpack5.0.2.tgz

I downloaded triton server fromRelease Release 2.27.0 corresponding to NGC container 22.10 · triton-inference-server/server (github.com)

How did you run inference? You generate your own docker and run inside it , right?

yes

CMD ["/tritonserver/bin/tritonserver", "--backend-directory=./backends/", "--model-repository=/models", "--strict-model-config=false", "--backend-config=python,grpc-timeout-milliseconds=90000000", "--grpc-port=9000", "--http-port=9001", "--metrics-port=9002"]

Please check what is the tensorrt version when /tritonserver/bin/tritonserver runs. And make sure run tao-converter under the same tensorrt version.

Officially for TAO, we already provide inference method when use triton.
See

https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/docker/Dockerfile
GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.