DGX Spark txt2kg playbook discrepancies / CPU fallback questions

Neurfer · November 14, 2025, 6:51am

FIX: Change OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

I had the same issue, but testing ollama image by itself shows, it’s not the image because it is able to use GPU.

# Run Ollama in a docker byitself
$ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Test
$ docker exec ollama ollama run llama3.1:8b "test" && docker exec ollama ollama ps

NAME           ID              SIZE      PROCESSOR    CONTEXT    UNTIL               
llama3.1:8b    46e0c10c039e    5.2 GB    100% GPU     4096       29 minutes from now  

# Locate the CUDA library.  Those name of dirs are the correct vaule for the OLLAMA_LLM_LIBRARY env var.
$ docker exec -it ollama bash
root:/# ls -l /usr/lib/ollama/
total 1568
drwxr-xr-x 2 root root   4096 Nov 13 22:01 cuda_jetpack5
drwxr-xr-x 2 root root   4096 Nov 13 21:59 cuda_jetpack6
drwxr-xr-x 2 root root   4096 Nov 13 22:12 cuda_v12
drwxr-xr-x 2 root root   4096 Nov 13 22:09 cuda_v13
-rwxr-xr-x 1 root root 857808 Nov 13 21:55 libggml-base.so
-rwxr-xr-x 1 root root 725928 Nov 13 21:55 libggml-cpu.so

So I changed OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

# FIX: Change the line #61 in docker-compose.yml
    environment:
      - OLLAMA_LLM_LIBRARY=cuda_v13       # Use CUDA library 

$ ./start.sh

# Test
$ docker exec ollama-compose ollama run llama3.1:8b "test" && docker exec ollama-compose ollama ps

NAME           ID              SIZE      PROCESSOR    CONTEXT    UNTIL               
llama3.1:8b    xxxxxxxxxxxxx   5.2 GB    100% GPU     4096       xx minutes from now

Longer answer

OLLAMA_LLM_LIBRARY is declared as an env-config key and mentioned in the docs, but the dynamic loader that actually picks/loads runtime backends is driven by the ggml dynamic-backend loader and OLLAMA_LIBRARY_PATH (not by OLLAMA_LLM_LIBRARY alone). In other words, setting OLLAMA_LLM_LIBRARY=cuda by itself is not sufficient if the dynamic CUDA backend library is not present/compatible or if OLLAMA_LIBRARY_PATH / LD_LIBRARY_PATH / container GPU access is incorrect — in those cases the code will fall back to the CPU backend and you’ll see ~100% CPU usage.

What to check (quick checklist — run on the machine where you see 100% CPU)

Check which LLM libraries are present:
ls /usr/lib/ollama or ls $(dirname $(readlink -f $(which ollama)))/../lib/ollama — list files to see cuda_v13*.so / cuda_v12*.so / cpu*.so present.

Topic		Replies	Views
Step 1 of Text to Knowledge Graph playbook has an error DGX Spark / GB10	6	376	November 23, 2025
Text to Knowledge Graph - Ollama issues DGX Spark / GB10	9	388	January 8, 2026
Txt2kg Playbook ./start.sh --complete does not start Additional Services (Complete Stack): DGX Spark / GB10	19	430	January 8, 2026
Txt2kg Knowledge Graph Triple Extraction is slow DGX Spark / GB10	5	140	February 11, 2026
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	7	2328	January 20, 2026
DGX Spark performance DGX Spark / GB10	49	5731	February 13, 2026
Nemotron-3-Super 120B on GB10 — llama.cpp sm_121 build + Ollama GGUF incompatibility fix DGX Spark / GB10 Projects llama , nemotron	3	930	March 22, 2026
Models not using Spark GPU? DGX Spark / GB10 containers	9	808	December 15, 2025
Spark-inference: Run 3 specialized models simultaneously on your DGX Spark — cybersecurity + coding + orchestration, 30-min setup DGX Spark / GB10 Projects jetson , llama , deepseek , nemotron	3	1116	May 11, 2026
FabricManager will not run DGX Spark / GB10	13	177	April 8, 2026

DGX Spark txt2kg playbook discrepancies / CPU fallback questions

FIX: Change OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

Longer answer

What to check (quick checklist — run on the machine where you see 100% CPU)

Related topics