TensorRT-LLM for Jetson

dusty_nv · November 13, 2024, 7:00pm

TensorRT-LLM is a high-performance LLM inference library with advanced quantization, attention kernels, and paged KV caching. Initial support for TensorRT-LLM in JetPack 6.1 has been included in the v0.12.0-jetson branch of the TensorRT-LLM repo for Jetson AGX Orin.

We’ve made pre-compiled TensorRT-LLM wheels and containers available, along with these guides and additional documentation:

> TensorRT-LLM Deployment on Jetson Orin

https://www.jetson-ai-lab.com/tensorrt_llm.html

shahizat · November 21, 2024, 7:28pm

Hi @dusty_nv, if someone is interested, I’ve created a small demo video using Streamlit with your TensorRT implementation on the AGX orin. Looks great.

wihill · November 26, 2024, 2:10pm

Very cool!

whitesscott · November 27, 2024, 2:10am

If anyone else has a problem running the example tensorrt-llm exercise here’s what fixed it for me.

The MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ repo has a requirments.txt and this is the only package in it that is not in the tensorrt-llm requirements.

git clone AutoGPTQ/AutoGPTQ

If you aren’t using conda, edit setup.py modify this line to this value conda_cuda_include_dir = “/usr/local/cuda/include”

Then:
export BUILD_CUDA_EXT=1
export TORCH_CUDA_ARCH_LIST=“8.7”
export COMPILE_MARLIN=1
MAX_JOBS=10 python -m pip wheel . --no-build-isolation -w dist

pip install dist/auto_gptq-0.8.0.dev0+cu126-cp310-cp310-linux_aarch64.whl

shahizat · December 8, 2024, 6:48pm

Running the same LLaMA 3.1 8B Instruct model with the Activation-aware Weight Quantization (AWQ) technique resulted in an improvement in inference speed.

kalustian · December 15, 2024, 4:49pm

What are the supported models for 0.12.0-jetson ? can you please point me the the full list ?

261142960 · December 23, 2024, 2:46am

I also think it would be best to list out the models that have already been tested.

y_ardavan · December 29, 2024, 8:35pm

Can anyone help me make a stand-alone API inference server to run a large whisper model for speech-to-text tasks? It needs to be on a portable machine (such as AGX Orin), without accessing the internet (but with a local network connection for API). It should be stand-alone, meaning that as soon as the power is connected, it should boot and start the API server without any need for someone to log in. No monitor and no keyboard are connected (except for initial setup and debugging). My email is y_ardavan@yahoo.com. Thank you.

yanghang162 · January 1, 2025, 4:52am

Has anyone tried using TensorRT-LLM on Jetson Orin NX (16GB)? I keep encountering the issue “core dump” when using trtllm-build, even with a small model (0.5B). The official tests were conducted on AGX Orin.

mayankarya · April 21, 2025, 2:49pm

Is multi node LLM inferencing supported on jetson orin agx? I want to serve a bigger model (that does not fit on one orin but does on two).

Is pipeline parallelism even allowed? I am running into issues when I try to follow: TensorRT-LLM/examples/llama at v0.12.0-jetson · NVIDIA/TensorRT-LLM · GitHub

nav-intel · July 7, 2025, 9:38pm

Unfortunately it looks like pypi.jetson-ai-lab.dev is down (yet again) so the TensorRT-LLM library wheels are not accessible. :(

Topic		Replies	Views
TensorRT-LLM for Jetson Announcements generative_ai	0	285	November 13, 2024
Running LLMs with TensorRT-LLM on Nvidia Jetson AGX Orin Dev Kit Jetson Projects jetson , generative_ai	1	859	December 8, 2024
Can TensorRT-LLM be used on Jetson Orin NX with JetPack 6.1? Jetson Orin NX tensorrt , generative_ai	6	432	December 17, 2024
Does TensorRT-LLM Supports on NVIDIA Jetson AGX Orin Edge Device? Jetson AGX Orin generative_ai	2	305	July 29, 2024
Deploying Triton Server with TensorRT-LLM on Jetson AGX Orin (JetPack 6.2) — Any Working Example? Jetson AGX Orin tensorrt , jetson-inference , inference-server-triton , generative_ai , llm	10	992	June 17, 2025
Inquiry on any updated support for tensorrt-llm support nvidia orin AGX? Jetson AGX Orin tensorrt , generative_ai , llama	4	274	June 3, 2025
Tensort-RT LLM Support for Jetson Jetson Orin Nano generative_ai	2	56	January 22, 2026
Nvidia jetson orin nano has tensorrt support? Jetson Orin Nano tensorrt	2	183	April 7, 2025
Orin Nano - Building TensorRT-LLM from source Jetson Orin Nano tensorrt , cuda , llama	9	504	November 17, 2025
Get error message as conver qwen to int-gptq in tensorrt-llm for agx orin DRIVE AGX Orin General driveworks-dnn-framework	4	282	December 10, 2024

TensorRT-LLM for Jetson

Related topics