Running LLMs with TensorRT-LLM on Nvidia Jetson AGX Orin Dev Kit

Greetings to all,

Below is the link to my latest post on deploying LLMs using the TensorRT-LLM on the Nvidia Jetson AGX Orin Developer Kit.

Thanks!

1 Like

Running the LLaMA 3.1 8B Instruct model with the Activation-aware Weight Quantization (AWQ) technique improved inference speed.