Ollama timing out when attempting to use GPU instead of CPU

I have a Jetson AGX Orin Dev kit on jetpack 6.0 (ubuntu 22.04) and I’m trying to get ollama running on it. It works fine in CPU mode but when switching to use GPU it runs into a timeout error. I get this same error consistently whether running in a container or as a daemon service.

I’ve followed the instructions in: jetson-containers/packages/llm/ollama at master · dusty-nv/jetson-containers · GitHub

Log error:
time=2024-07-29T04:53:19.157Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.581084132 model=/root/.ollama/models/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87

CLI reporting error:
Error: timed out waiting for llama runner to start - progress 0.00 -

Full log output:
jetson-containers-run.log (30.6 KB)

CLI output:
cli-ollama-run.log (3.8 KB)

Hi,

Could you share the tegrastats monitoring at the same time with us for reference?

$ sudo tegrastats 

Thanks.

Sure thing. I’ll also add nvidia-smi results (which I had on a watch command at 1s intervals). It never changed

Tegrastats
tegrastats.log (71.8 KB)

full logs:
jetson-containers-run2.log (30.1 KB)

I’m wondering if I should downgrade to Jetpack 5.1.x as I’ve seen posts where people say that it can run both natively and in jetson-containers.

another thing i’ll add, is that you’ll notice from the nvidia-smi i’m running cuda 12.2 yet the jetson container is using v11. Considering its a container maybe this doesn’t matter, but wanted to note the mismatch
time=2024-07-29T13:46:52.397Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"


time=2024-07-29T13:48:00.573Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama68986735/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 40051"

Hi @apps11, the log shows you are running ollama/ollama:latest, not the one from jetson-containers. Can you try running dustynv/ollama:r36.3.0 instead?

That did it, thanks so much! interestingly, my docker run command is what was generated by:

jetson-containers run --name ollama $(autotag ollama)

I guess the lesson is use that for the original hookup and update the docker run after the fact.

Ok interesting! Yes, that must be because autotag looks in your docker images for containers with matching names, and yea it found ollama/ollama - sorry about that haha.

llama.cpp is not bad to install standalone, ollama I heard could work with their binaries. I would not downgrade back to JP5, if for none other than a lot of ML stuff is on Python 3.10 now.

So you’d recommend building from source in order to run natively for ubuntu 22 (jp6) on the jetson agx orin?

@apps11 yes I would build from source, either natively or in container

1 Like

Encountered a similar issue running ollama on an AGX Xavier. Compiling ollama (0.3.1) natively and setting CMAKE_CUDA_ARCHITECTURES="72" resolved the problem.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.