DGX Spark txt2kg playbook discrepancies / CPU fallback questions

FIX: Change OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

I had the same issue, but testing ollama image by itself shows, it’s not the image because it is able to use GPU.

# Run Ollama in a docker byitself
$ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Test
$ docker exec ollama ollama run llama3.1:8b "test" && docker exec ollama ollama ps

NAME           ID              SIZE      PROCESSOR    CONTEXT    UNTIL               
llama3.1:8b    46e0c10c039e    5.2 GB    100% GPU     4096       29 minutes from now  

# Locate the CUDA library.  Those name of dirs are the correct vaule for the OLLAMA_LLM_LIBRARY env var.
$ docker exec -it ollama bash
root:/# ls -l /usr/lib/ollama/
total 1568
drwxr-xr-x 2 root root   4096 Nov 13 22:01 cuda_jetpack5
drwxr-xr-x 2 root root   4096 Nov 13 21:59 cuda_jetpack6
drwxr-xr-x 2 root root   4096 Nov 13 22:12 cuda_v12
drwxr-xr-x 2 root root   4096 Nov 13 22:09 cuda_v13
-rwxr-xr-x 1 root root 857808 Nov 13 21:55 libggml-base.so
-rwxr-xr-x 1 root root 725928 Nov 13 21:55 libggml-cpu.so

So I changed OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

# FIX: Change the line #61 in docker-compose.yml
    environment:
      - OLLAMA_LLM_LIBRARY=cuda_v13       # Use CUDA library 

$ ./start.sh

# Test
$ docker exec ollama-compose ollama run llama3.1:8b "test" && docker exec ollama-compose ollama ps

NAME           ID              SIZE      PROCESSOR    CONTEXT    UNTIL               
llama3.1:8b    xxxxxxxxxxxxx   5.2 GB    100% GPU     4096       xx minutes from now  

Longer answer

OLLAMA_LLM_LIBRARY is declared as an env-config key and mentioned in the docs, but the dynamic loader that actually picks/loads runtime backends is driven by the ggml dynamic-backend loader and OLLAMA_LIBRARY_PATH (not by OLLAMA_LLM_LIBRARY alone). In other words, setting OLLAMA_LLM_LIBRARY=cuda by itself is not sufficient if the dynamic CUDA backend library is not present/compatible or if OLLAMA_LIBRARY_PATH / LD_LIBRARY_PATH / container GPU access is incorrect — in those cases the code will fall back to the CPU backend and you’ll see ~100% CPU usage.

What to check (quick checklist — run on the machine where you see 100% CPU)

  • Check which LLM libraries are present:
    ls /usr/lib/ollama or ls $(dirname $(readlink -f $(which ollama)))/../lib/ollama — list files to see cuda_v13*.so / cuda_v12*.so / cpu*.so present.