Description
I am currently working on optimizing LLaMA 3.1 using the TensorRT Model Optimizer (nvidia-modelopt) and TensorRT-LLM. I would like to know if the Model Optimizer is compatible with the aarch64 architecture as the official documentation states that the system requirement is x86_64.
Specifically, I am interested in:
-
Official support for aarch64.
-
Any potential workarounds or methods to enable its functionality on aarch64 systems if it is not supported.
-
Any performance considerations or limitations I should be aware of when attempting to use it on this architecture.
Thank you for your assistance!
Environment
TensorRT-LLM version: 0.14.0.dev2024091700
GPU Type: GH200
Nvidia Driver Version: 550.90.12
CUDA Version: 12.5
Operating System + Version: Ubuntu 22.04.4 LTS