Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to?

nav-intel · September 21, 2024, 9:00pm

Hi all, I am new to jetson, I have acquired a Jetson AGX Xavier 16gb and yes I know its an older machine now. But I would really like to get Ollama and llama3.1 running on it. I followed a set of instructions I found on medium.com but the install crashed out with loads of errors and broke the OS and it took the rest of the day to get it sorted. :( So I thought I would ask here. I want to get it running as fast and as natively as I can. I have got it running on a Pi5 with no issues and yes its CPU only and very slow so i am really hoping i can use those Cuda Cores! can anyone help? Thanks

generative.cloud · September 22, 2024, 4:54am

I think the only 2 variants of AGX Xavier are 32gb and 64gb. I didn’t find the 16gb version on the product website.

nav-intel · September 22, 2024, 11:39am

Hi well it does exist, I think Nvidia superseded the 16gb version with the 32gb version around 2020. Bt the good news is that I found the issue. I did a forced re-install of the operating system to jetpack 35.5 and then tried to install Ollama and Llama3.1:instruct again and this time it installed and runs with no issues at quite a respectable speed too. So I am very pleased.

nav-intel · September 26, 2024, 8:52am

OK I spoke too soon… The Ollama default install from Ollama runs but only uses the CPU’s not the GPU’s so it runs very slightly faster than a raspberry pi. Anyone know how to get OLLama to use the GPU’s apparently its a bug

dusty_nv · September 27, 2024, 1:47am

Hi @nav-intel, there is an ollama container built with CUDA and JetPack 5 here: dustynv/ollama:r35.4.1

Using container to run the ollama server can help avoid such re-installation of OS and GPU support. Docker is not a virtual machine and doesn’t impact compute performance, but it does keep your environment in order. The ollama client can be run outside of container. There is a tutorial on Jetson AI Lab showing it, I have used it on Orin but others have reported it working on Xavier and Nano.

nav-intel · September 27, 2024, 10:29am

HI @dusty_nv Dusty, It’s kind of you to reply. I did find your excellent containers and I think it was your videos on Youtube that inspired me to try the Nvidia Jetson Platform. I can report that it does run on an Xavier AGX 16gig and tinyllama is blindingly fast. I after testing tinnyllama I wanted to then run Llama3.1 but I get the message “The model you are trying to run requires a newer version of Ollama you can download it at Ollama.com/downloads” This is an issue because llama 3.1 is the version I have been using on other (slower) platforms and I need to run 3.1. also of course I am dying to try out the new multimodal 3.2 that dropped yesterday! Thanks for all your help.

dusty_nv · October 4, 2024, 3:56pm

Hi @nav-intel, since my last message I changed the ollama dockerfile so that it sets the version to 0.0.0, which should bypass those kind of version checks going forward. If you pull the latest updates from jetson-containers with git and build ollama, that will compile their latest version. I looked again for Llama-3.2-Vision support in llama.cpp, but it seems they are looking for external contributors to support the VLMs (which I can understand as they tend to require significant effort…in particular Llama Multimodal is using cross-attention). If you hear of other ways it’s been quantized (beyond bitsandbytes), let me know 👍

nav-intel · October 5, 2024, 10:56am

HI Dusty, Thanks for that it works and Llama3.1 runs without complaint now. I did try Llama3.2:11b-Vision and Vision-Instruct using your helpful post. I did have to modify the code slightly as I could not get the LLM to stop answering questions about the Hover dam no mater what image I gave it. At one point it said “Dam, Dam, Dam, Dam, Dam, Dam, Dam” So I guess it was as annoyed as I was! The Nvidia Jetson Orin AGX 64Gb ran both LLM’s OK but complained of over-voltage and throttled. I am using the supplied Power supply that plugs into the USB port above the barrel jack power socket. Curious as to what might cause the message.Would I still get the message if I purchased a 90 watt power supply for the Barrel Jack socket? I still haven’t managed to compile my own Ollama instance that uses the GPU properly but your Ollama container runs fine (and fast) thanks again. I do have one more suggestion - all the examples name the jetson container “my_jetson_container” which means that you have to rename or delete a previous container before trying out the next example. Would it be possible to give each jetson container a unique name so that they don’t clash?
Best regards H.

system · October 19, 2024, 10:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Installed Ollama container from jetson-containers, error: no compatible GPUs found Jetson AGX Xavier containers , generative_ai	5	193	April 9, 2025
@Dusty_nv has anyone managed to get Ollama running with llama3.2-vision yet? Jetson AGX Orin cuda , generative_ai , llama	7	503	December 28, 2024
Ollama is running slow on Jetson AGX Orin Dev-kit (32G) Jetson AGX Orin generative_ai	2	1167	February 29, 2024
Seeking Advice on Running Quantized Large Language Models on Jetson AGX Xavier Jetson AGX Xavier generative_ai	2	909	March 19, 2024
Ollama 0.4.2 released and runs on Nvidia Jetson Orin AGX 64 Jetson AGX Orin generative_ai , llama	9	1588	November 21, 2024
Ollama timing out when attempting to use GPU instead of CPU Jetson AGX Orin cuda , jetson-inference , generative_ai	9	5184	August 27, 2024
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	5640	March 20, 2024
Ollama run Gives: Error-GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !"CUDA error" Jetson AGX Orin cuda	3	2213	May 15, 2024
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	12408	August 28, 2024
Ollama Docker in Jetson AGX Orin Jetson AGX Orin docker , generative_ai	2	412	November 26, 2024

Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to?

Related topics