I have a Jetson AGX Orin Dev kit on jetpack 6.0 (ubuntu 22.04) and I’m trying to get ollama running on it. It works fine in CPU mode but when switching to use GPU it runs into a timeout error. I get this same error consistently whether running in a container or as a daemon service.
I’m wondering if I should downgrade to Jetpack 5.1.x as I’ve seen posts where people say that it can run both natively and in jetson-containers.
another thing i’ll add, is that you’ll notice from the nvidia-smi i’m running cuda 12.2 yet the jetson container is using v11. Considering its a container maybe this doesn’t matter, but wanted to note the mismatch time=2024-07-29T13:46:52.397Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
Hi @apps11, the log shows you are running ollama/ollama:latest, not the one from jetson-containers. Can you try running dustynv/ollama:r36.3.0 instead?
Ok interesting! Yes, that must be because autotag looks in your docker images for containers with matching names, and yea it found ollama/ollama - sorry about that haha.
llama.cpp is not bad to install standalone, ollama I heard could work with their binaries. I would not downgrade back to JP5, if for none other than a lot of ML stuff is on Python 3.10 now.
Encountered a similar issue running ollama on an AGX Xavier. Compiling ollama (0.3.1) natively and setting CMAKE_CUDA_ARCHITECTURES="72" resolved the problem.