Ollama timing out when attempting to use GPU instead of CPU

apps11 · July 29, 2024, 5:09am

I have a Jetson AGX Orin Dev kit on jetpack 6.0 (ubuntu 22.04) and I’m trying to get ollama running on it. It works fine in CPU mode but when switching to use GPU it runs into a timeout error. I get this same error consistently whether running in a container or as a daemon service.

I’ve followed the instructions in: jetson-containers/packages/llm/ollama at master · dusty-nv/jetson-containers · GitHub

Log error:
time=2024-07-29T04:53:19.157Z level=WARN source=sched.go:634 msg="gpu VRAM usage didn't recover within timeout" seconds=5.581084132 model=/root/.ollama/models/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87

CLI reporting error:
Error: timed out waiting for llama runner to start - progress 0.00 -

Full log output:
jetson-containers-run.log (30.6 KB)

CLI output:
cli-ollama-run.log (3.8 KB)

AastaLLL · July 29, 2024, 9:23am

Hi,

Could you share the tegrastats monitoring at the same time with us for reference?

$ sudo tegrastats

Thanks.

apps11 · July 29, 2024, 1:56pm

Sure thing. I’ll also add nvidia-smi results (which I had on a watch command at 1s intervals). It never changed

Tegrastats
tegrastats.log (71.8 KB)

full logs:
jetson-containers-run2.log (30.1 KB)

I’m wondering if I should downgrade to Jetpack 5.1.x as I’ve seen posts where people say that it can run both natively and in jetson-containers.

another thing i’ll add, is that you’ll notice from the nvidia-smi i’m running cuda 12.2 yet the jetson container is using v11. Considering its a container maybe this doesn’t matter, but wanted to note the mismatch
time=2024-07-29T13:46:52.397Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"


time=2024-07-29T13:48:00.573Z level=INFO source=server.go:383 msg="starting llama server" cmd="/tmp/ollama68986735/runners/cuda_v11/ollama_llama_server --model /root/.ollama/models/blobs/sha256-ff82381e2bea77d91c1b824c7afb83f6fb73e9f7de9dda631bcdbca564aa5435 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 40051"

dusty_nv · July 29, 2024, 2:34pm

Hi @apps11, the log shows you are running ollama/ollama:latest, not the one from jetson-containers. Can you try running dustynv/ollama:r36.3.0 instead?

apps11 · July 29, 2024, 5:21pm

That did it, thanks so much! interestingly, my docker run command is what was generated by:

jetson-containers run --name ollama $(autotag ollama)

I guess the lesson is use that for the original hookup and update the docker run after the fact.

dusty_nv · July 29, 2024, 8:32pm

Ok interesting! Yes, that must be because autotag looks in your docker images for containers with matching names, and yea it found ollama/ollama - sorry about that haha.

llama.cpp is not bad to install standalone, ollama I heard could work with their binaries. I would not downgrade back to JP5, if for none other than a lot of ML stuff is on Python 3.10 now.

apps11 · July 30, 2024, 7:58pm

So you’d recommend building from source in order to run natively for ubuntu 22 (jp6) on the jetson agx orin?

dusty_nv · July 30, 2024, 11:50pm

@apps11 yes I would build from source, either natively or in container

markusp · July 31, 2024, 4:38pm

Encountered a similar issue running ollama on an AGX Xavier. Compiling ollama (0.3.1) natively and setting CMAKE_CUDA_ARCHITECTURES="72" resolved the problem.

system · August 27, 2024, 7:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ollama run Gives: Error-GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml-cuda.cu:60: !"CUDA error" Jetson AGX Orin cuda	3	2400	May 15, 2024
Ollama Docker in Jetson AGX Orin Jetson AGX Orin docker , generative_ai	2	505	November 26, 2024
Ollama is running slow on Jetson AGX Orin Dev-kit (32G) Jetson AGX Orin generative_ai	2	1216	February 29, 2024
Ollama unable to detect gpu on JetPack 6.1 Jetson AGX Orin generative_ai	7	957	October 15, 2024
Ollama 0.4.2 released and runs on Nvidia Jetson Orin AGX 64 Jetson AGX Orin generative_ai , llama	9	1811	November 21, 2024
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	5931	March 20, 2024
LLMs token/sec Jetson AGX Orin generative_ai	2	1146	April 8, 2024
Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to? Jetson AGX Xavier generative_ai , llama-31-8b-instruct	8	2523	October 19, 2024
Issue with Nvidia Jetson AGX Orin Developer Kit (64 Gb) Jetson AGX Orin cuda , generative_ai	5	205	July 30, 2025
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	12957	August 28, 2024

Ollama timing out when attempting to use GPU instead of CPU

Related topics