TensorRT-LLM for Jetson

TensorRT-LLM is a high-performance LLM inference library with advanced quantization, attention kernels, and paged KV caching. Initial support for TensorRT-LLM in JetPack 6.1 has been included in the v0.12.0-jetson branch of the TensorRT-LLM repo for Jetson AGX Orin.

We’ve made pre-compiled TensorRT-LLM wheels and containers available, along with these guides and additional documentation:

> TensorRT-LLM Deployment on Jetson Orin

1 Like