Running Llama3.1 on JP5.1

pcha · December 28, 2024, 6:11am

Hi,

I’m trying to run Llama 3.1 on Jetson AGX orin running on JP5.1. I’ve tried MLC but I found out that the MLC image for JP5.1 does not support it. Are there any alternatives to do this?

carolyuu · December 28, 2024, 6:30am

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

TensorFlow: Installing TensorFlow for Jetson Platform - NVIDIA Docs
PyTorch: Installing PyTorch for Jetson Platform - NVIDIA Docs
We also have containers that have frameworks preinstalled:
Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

3. Tutorial

Startup deep learning tutorial:

Jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson
TensorRT sample: Jetson/L4T/TRT Customized Example - eLinux.org

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

AastaLLL · December 30, 2024, 3:00am

Hi,

Do you run it with HF transformer?
If so, could you try dustynv/llama-factory to see if it works?

The container already includes Transformers, FlashAttention, bitsandbytes, AutoGPTQ, vLLM.

Thanks.

pcha · December 30, 2024, 7:28am

I was trying to use either MLC or Ollama.

Do you have an example to use dustynv/llama-factory?
BTW, It seems it’s for fine-tuning. What I need is an inference engine.

Thanks,

AastaLLL · January 2, 2025, 4:16am

Hi,

The dustynv/llama-factory supports the HF transformer.
For example, you can run InternVL2 with the instructions in the model card:

We also have MLC and Ollama prebuilt containers:
Please find it below:

MLC: jetson-containers/packages/llm/mlc at master · dusty-nv/jetson-containers · GitHub
Ollama: jetson-containers/packages/llm/ollama at master · dusty-nv/jetson-containers · GitHub

Thanks.

dusty_nv · January 10, 2025, 4:26pm

Hi @pcha , I tried rebuilding MLC for JetPack 5.1, but was encountering compliation issues from the older CUDA version:

I would recommend trying previous MLC versions or trying to patch the errors (although that may be a losing battle)

If you are on AGX Orin, you can compile MLC through the jetson-containers builder, but I would recommend upgrading to JetPack 6 if possible, if you need to keep current with genAI libraries.

You can easily build/run llama.cpp or ollama on JetPack 5 and run llama 3.1 with quantization, but it will be roughtly ~65% the performance you would have gotten from the likes of MLC or TRT-LLM.

system · February 26, 2025, 12:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Running llama3.3 or llama4 on Jetson AGX Orin Developer Kit (64 GB) Jetson AGX Orin generative_ai	7	101	May 12, 2025
Failed to MLC-compile mlc-ai/Llama-3.1-8B-Instruct-fp8-MLC on Jetson AGX orin Jetson AGX Orin generative_ai , llama-31-8b-instruct , llama	5	108	January 13, 2025
Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to? Jetson AGX Xavier generative_ai , llama-31-8b-instruct	8	1949	October 19, 2024
TensorRT-LLM for Jetson Jetson AGX Orin generative_ai	10	1924	April 21, 2025
MiniCPM-Llama3-V-2_5 live on Jetson Orin Jetson AGX Orin generative_ai	11	619	August 9, 2024
Ollama support for Jetson Nano Jetson Nano generative_ai	8	108	April 1, 2025
Problem: slow LLM inference speed on Jetson AGX Orin 64GB Jetson AGX Orin jetson-inference , generative_ai	2	143	April 8, 2025
@Dusty_nv has anyone managed to get Ollama running with llama3.2-vision yet? Jetson AGX Orin cuda , generative_ai , llama	7	401	December 28, 2024
Xavier AGX NanoLLM Compatible?!?! Jetson AGX Xavier generative_ai	3	270	May 15, 2024
Trouble running Llamaspeak on AGX Orin 64GB Jetson AGX Orin demos-and-tutorials , generative_ai	8	489	May 25, 2024

Running Llama3.1 on JP5.1

1. Performance

2. Installation

3. Tutorial

4. Report issue

Related topics