Does TensorRT Model Optimizer Support aarch64 for LLaMA 3.1 Optimization?

samit1 · September 19, 2024, 2:53pm

Description

I am currently working on optimizing LLaMA 3.1 using the TensorRT Model Optimizer (nvidia-modelopt) and TensorRT-LLM. I would like to know if the Model Optimizer is compatible with the aarch64 architecture as the official documentation states that the system requirement is x86_64.

Specifically, I am interested in:

Official support for aarch64.
Any potential workarounds or methods to enable its functionality on aarch64 systems if it is not supported.
Any performance considerations or limitations I should be aware of when attempting to use it on this architecture.

Thank you for your assistance!

Environment

TensorRT-LLM version: 0.14.0.dev2024091700
GPU Type: GH200
Nvidia Driver Version: 550.90.12
CUDA Version: 12.5
Operating System + Version: Ubuntu 22.04.4 LTS

jwalley1 · July 15, 2025, 1:28pm

Hey I’m right there with you. I have not been able to get any LLM Servers running under aarch64 I’ve tried vLLM, Llama.cpp, TensorRT, TensorRT-LLM I ended up writing my own engine. it works, kind of. But it only uses a portion of the capability of the chip. I’m only getting like 12tok/sec i should be pushing over 50. I think its a transformer issue on aarch64. Ive been beating my head against the wall for a week. I will say the ubuntu 24.04 and cuda 12.8 including torch cu128 will compile in most cases. it won’t work on ubuntu 22.04

Topic		Replies	Views
TRT3.0.4+CUDA8+ARM64 TensorRT	0	913	April 28, 2019
Can we run TensorRT 3.0 on L4T 24.1 ? TensorRT	2	698	October 12, 2021
TensorRT 7.1.3 for aarch64 Jetson AGX Xavier tensorrt	2	994	October 18, 2021
Where can i get tensorrt4.0.0.3 or tensorrt5.X for linux arm DRIVE - Linux	10	1064	October 12, 2021
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1795	January 25, 2024
TensorRT 3.0 Python API General	6	2474	January 30, 2018
TF-TRT RNN NMT model optimise, Input tensor with shape [?,?] TensorRT	0	635	May 29, 2019
Compile TRT model for specific compute TensorRT	0	317	April 6, 2020
Fail to speed up model by tensorrt TensorRT	0	1108	September 5, 2019
Use TenserRT2.1 for LSTM layer with peephole and projection GPU-Accelerated Libraries	0	487	September 21, 2017

Does TensorRT Model Optimizer Support aarch64 for LLaMA 3.1 Optimization?

Description

Environment

Related topics