Running Llama3.1 on JP5.1

Hi,

I’m trying to run Llama 3.1 on Jetson AGX orin running on JP5.1. I’ve tried MLC but I found out that the MLC image for JP5.1 does not support it. Are there any alternatives to do this?

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Hi,

Do you run it with HF transformer?
If so, could you try dustynv/llama-factory to see if it works?

The container already includes Transformers, FlashAttention, bitsandbytes, AutoGPTQ, vLLM.

Thanks.

I was trying to use either MLC or Ollama.

Do you have an example to use dustynv/llama-factory?
BTW, It seems it’s for fine-tuning. What I need is an inference engine.

Thanks,

Hi,

The dustynv/llama-factory supports the HF transformer.
For example, you can run InternVL2 with the instructions in the model card:

We also have MLC and Ollama prebuilt containers:
Please find it below:

MLC: jetson-containers/packages/llm/mlc at master · dusty-nv/jetson-containers · GitHub
Ollama: jetson-containers/packages/llm/ollama at master · dusty-nv/jetson-containers · GitHub

Thanks.

Hi @pcha , I tried rebuilding MLC for JetPack 5.1, but was encountering compliation issues from the older CUDA version:

I would recommend trying previous MLC versions or trying to patch the errors (although that may be a losing battle)

If you are on AGX Orin, you can compile MLC through the jetson-containers builder, but I would recommend upgrading to JetPack 6 if possible, if you need to keep current with genAI libraries.

You can easily build/run llama.cpp or ollama on JetPack 5 and run llama 3.1 with quantization, but it will be roughtly ~65% the performance you would have gotten from the likes of MLC or TRT-LLM.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.