Hi all, I am new to jetson, I have acquired a Jetson AGX Xavier 16gb and yes I know its an older machine now. But I would really like to get Ollama and llama3.1 running on it. I followed a set of instructions I found on medium.com but the install crashed out with loads of errors and broke the OS and it took the rest of the day to get it sorted. :( So I thought I would ask here. I want to get it running as fast and as natively as I can. I have got it running on a Pi5 with no issues and yes its CPU only and very slow so i am really hoping i can use those Cuda Cores! can anyone help? Thanks
I think the only 2 variants of AGX Xavier are 32gb and 64gb. I didn’t find the 16gb version on the product website.
Hi well it does exist, I think Nvidia superseded the 16gb version with the 32gb version around 2020. Bt the good news is that I found the issue. I did a forced re-install of the operating system to jetpack 35.5 and then tried to install Ollama and Llama3.1:instruct again and this time it installed and runs with no issues at quite a respectable speed too. So I am very pleased.
OK I spoke too soon… The Ollama default install from Ollama runs but only uses the CPU’s not the GPU’s so it runs very slightly faster than a raspberry pi. Anyone know how to get OLLama to use the GPU’s apparently its a bug
Hi @nav-intel, there is an ollama container built with CUDA and JetPack 5 here: dustynv/ollama:r35.4.1
Using container to run the ollama server can help avoid such re-installation of OS and GPU support. Docker is not a virtual machine and doesn’t impact compute performance, but it does keep your environment in order. The ollama client can be run outside of container. There is a tutorial on Jetson AI Lab showing it, I have used it on Orin but others have reported it working on Xavier and Nano.
HI @dusty_nv Dusty, It’s kind of you to reply. I did find your excellent containers and I think it was your videos on Youtube that inspired me to try the Nvidia Jetson Platform. I can report that it does run on an Xavier AGX 16gig and tinyllama is blindingly fast. I after testing tinnyllama I wanted to then run Llama3.1 but I get the message “The model you are trying to run requires a newer version of Ollama you can download it at Ollama.com/downloads” This is an issue because llama 3.1 is the version I have been using on other (slower) platforms and I need to run 3.1. also of course I am dying to try out the new multimodal 3.2 that dropped yesterday! Thanks for all your help.
Hi @nav-intel, since my last message I changed the ollama dockerfile so that it sets the version to 0.0.0
, which should bypass those kind of version checks going forward. If you pull the latest updates from jetson-containers with git and build ollama, that will compile their latest version. I looked again for Llama-3.2-Vision support in llama.cpp, but it seems they are looking for external contributors to support the VLMs (which I can understand as they tend to require significant effort…in particular Llama Multimodal is using cross-attention). If you hear of other ways it’s been quantized (beyond bitsandbytes), let me know 👍
HI Dusty, Thanks for that it works and Llama3.1 runs without complaint now. I did try Llama3.2:11b-Vision and Vision-Instruct using your helpful post. I did have to modify the code slightly as I could not get the LLM to stop answering questions about the Hover dam no mater what image I gave it. At one point it said “Dam, Dam, Dam, Dam, Dam, Dam, Dam” So I guess it was as annoyed as I was! The Nvidia Jetson Orin AGX 64Gb ran both LLM’s OK but complained of over-voltage and throttled. I am using the supplied Power supply that plugs into the USB port above the barrel jack power socket. Curious as to what might cause the message.Would I still get the message if I purchased a 90 watt power supply for the Barrel Jack socket? I still haven’t managed to compile my own Ollama instance that uses the GPU properly but your Ollama container runs fine (and fast) thanks again. I do have one more suggestion - all the examples name the jetson container “my_jetson_container” which means that you have to rename or delete a previous container before trying out the next example. Would it be possible to give each jetson container a unique name so that they don’t clash?
Best regards H.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.