Hi,
I’m trying to run Llama 3.1 on Jetson AGX orin running on JP5.1. I’ve tried MLC but I found out that the MLC image for JP5.1 does not support it. Are there any alternatives to do this?
Hi,
I’m trying to run Llama 3.1 on Jetson AGX orin running on JP5.1. I’ve tried MLC but I found out that the MLC image for JP5.1 does not support it. Are there any alternatives to do this?
Hi,
Here are some suggestions for the common issues:
Please run the below command before benchmarking deep learning use case:
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
Installation guide of deep learning frameworks on Jetson:
Startup deep learning tutorial:
If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.
Thanks!
Hi,
Do you run it with HF transformer?
If so, could you try dustynv/llama-factory
to see if it works?
The container already includes Transformers, FlashAttention, bitsandbytes, AutoGPTQ, vLLM.
Thanks.
I was trying to use either MLC or Ollama.
Do you have an example to use dustynv/llama-factory
?
BTW, It seems it’s for fine-tuning. What I need is an inference engine.
Thanks,
Hi,
The dustynv/llama-factory
supports the HF transformer.
For example, you can run InternVL2 with the instructions in the model card:
We also have MLC and Ollama prebuilt containers:
Please find it below:
MLC: jetson-containers/packages/llm/mlc at master · dusty-nv/jetson-containers · GitHub
Ollama: jetson-containers/packages/llm/ollama at master · dusty-nv/jetson-containers · GitHub
Thanks.
Hi @pcha , I tried rebuilding MLC for JetPack 5.1, but was encountering compliation issues from the older CUDA version:
I would recommend trying previous MLC versions or trying to patch the errors (although that may be a losing battle)
If you are on AGX Orin, you can compile MLC through the jetson-containers builder, but I would recommend upgrading to JetPack 6 if possible, if you need to keep current with genAI libraries.
You can easily build/run llama.cpp or ollama on JetPack 5 and run llama 3.1 with quantization, but it will be roughtly ~65% the performance you would have gotten from the likes of MLC or TRT-LLM.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.