Greetings to all,
Below is the link to my latest post on deploying LLMs using the TensorRT-LLM on the Nvidia Jetson AGX Orin Developer Kit.
Thanks!
Greetings to all,
Below is the link to my latest post on deploying LLMs using the TensorRT-LLM on the Nvidia Jetson AGX Orin Developer Kit.
Thanks!
Running the LLaMA 3.1 8B Instruct model with the Activation-aware Weight Quantization (AWQ) technique improved inference speed.