FIX: Change OLLAMA_LLM_LIBRARY from cuda to cuda_v13.
I had the same issue, but testing ollama image by itself shows, it’s not the image because it is able to use GPU.
# Run Ollama in a docker byitself
$ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Test
$ docker exec ollama ollama run llama3.1:8b "test" && docker exec ollama ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
llama3.1:8b 46e0c10c039e 5.2 GB 100% GPU 4096 29 minutes from now
# Locate the CUDA library. Those name of dirs are the correct vaule for the OLLAMA_LLM_LIBRARY env var.
$ docker exec -it ollama bash
root:/# ls -l /usr/lib/ollama/
total 1568
drwxr-xr-x 2 root root 4096 Nov 13 22:01 cuda_jetpack5
drwxr-xr-x 2 root root 4096 Nov 13 21:59 cuda_jetpack6
drwxr-xr-x 2 root root 4096 Nov 13 22:12 cuda_v12
drwxr-xr-x 2 root root 4096 Nov 13 22:09 cuda_v13
-rwxr-xr-x 1 root root 857808 Nov 13 21:55 libggml-base.so
-rwxr-xr-x 1 root root 725928 Nov 13 21:55 libggml-cpu.so
So I changed OLLAMA_LLM_LIBRARY from cuda to cuda_v13.
# FIX: Change the line #61 in docker-compose.yml
environment:
- OLLAMA_LLM_LIBRARY=cuda_v13 # Use CUDA library
$ ./start.sh
# Test
$ docker exec ollama-compose ollama run llama3.1:8b "test" && docker exec ollama-compose ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
llama3.1:8b xxxxxxxxxxxxx 5.2 GB 100% GPU 4096 xx minutes from now
Longer answer
OLLAMA_LLM_LIBRARY is declared as an env-config key and mentioned in the docs, but the dynamic loader that actually picks/loads runtime backends is driven by the ggml dynamic-backend loader and OLLAMA_LIBRARY_PATH (not by OLLAMA_LLM_LIBRARY alone). In other words, setting OLLAMA_LLM_LIBRARY=cuda by itself is not sufficient if the dynamic CUDA backend library is not present/compatible or if OLLAMA_LIBRARY_PATH / LD_LIBRARY_PATH / container GPU access is incorrect — in those cases the code will fall back to the CPU backend and you’ll see ~100% CPU usage.
What to check (quick checklist — run on the machine where you see 100% CPU)
- Check which LLM libraries are present:
ls /usr/lib/ollama or ls $(dirname $(readlink -f $(which ollama)))/../lib/ollama — list files to see cuda_v13*.so / cuda_v12*.so / cpu*.so present.