Jetson AI Lab - ML DevOps, Containers, Core Inferencing

dusty_nv · April 2, 2024, 5:21pm

4/2/24 - TensorRT-LLM support on Jetson

As detailed in the posts above, now that we have access to latest CUDA and the ability to rebuild all the other downstream packages we need, we may be able to build mainline TensorRT-LLM (hopefully without much patching required). This is an ongoing effort in coordination with the TensorRT team that we are excited about, in order to provide edge-to-cloud compatibility with other NVIDIA production workflows, NeMo megatron models, and deploying NIM microservices to the edge.
TensorRT-LLM will be integrated into NanoLLM as another API backend, in addition to MLC. MLC/TVM already achieves greater than 95% peak Orin performance/efficiency on Llama (as shown in the Benchmarks on Jetson AI Lab), so performance-wise we’re already in a great place - however TensorRT-LLM will be good to have for the aforementioned compatibility reasons and production-grade support. For now, continue using the NanoLLM APIs to provide a seamless transition to TensorRT-LLM once it’s enabled, and to gain all the support for multimodality and I/O streaming in NanoLLM.
This is all subject to change regarding TensorRT-LLM depending on the outcomes of these ongoing engineering efforts. Once TensorRT 10 soon becomes available for Jetson, I will begin work shortly on attempting to compile the latest TensorRT-LLM for Jetson against CUDA 12.4 and TRT10. Assuming success, from there binaries can be provided through jetson-containers and the pip server, and further integration work with NanoLLM and other projects can proceed.

Topic		Replies	Views
Jetson AI Lab - Home Assistant Integration Jetson Projects generative_ai	62	10365	February 19, 2025
Xavier NX does not support adaptative average pooling on DLA? Jetson Xavier NX tensorrt	27	1115	October 11, 2023
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	707	April 30, 2024
Trtexec model conversion crashed at insufficient gpu memory Jetson Orin NX jetson-inference	27	4914	January 11, 2023
The jetson_benchmarks from github can not run on Xavier NX 16G SoM Jetson Xavier NX jetson-inference	25	1330	May 31, 2023
Model onnx trt engine generation process report different results compared between PC and jetson XAVIER NX Jetson Xavier NX tensorrt	19	1013	September 28, 2022
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1394	July 12, 2022
Jetpack 4.4 Broke one of my programs Jetson Nano cudnn	24	3109	October 18, 2021
Simple 2 layer U-Net breaks TensorRT conversion TensorRT	14	1245	October 12, 2021
Pytorch & torchversion compatible issue on L4T35.5.0 Jetson Orin Nano pytorch	20	420	November 7, 2024

Jetson AI Lab - ML DevOps, Containers, Core Inferencing

Related topics