DGX Spark txt2kg playbook discrepancies / CPU fallback questions

john_g · October 27, 2025, 10:36pm

Disclaimer: I definitely fall into the novice category; issues here may qualify as defective user…

I have been unsuccessful in running the txt2kg tool on my Spark GPU. As I went through the playbook here are the various challenges/anomalies:

Step 1: Clone the repository

There is a typo in the path. Missing the s at the end of dgx-spark-playbooks/…

Step 2: Start the txt2kg services
The system fails to allocate ollama to the GPU, instead falling back to CPU inference
ollama-compose | … msg=“inference compute” id=cpu library=cpu compute=“” … total=“119.7 GiB” available=“115.8 GiB”
ollama-compose | … msg=“entering low vram mode” “total vram”=“0 B” threshold=“20.0 GiB”

I dug around in services and found clear_cache_and_restart.sh in the ../deploy/services/ollama folder. As written, it didn’t have a happy path and promptly shook its fist at me. However, after correcting the path and trying again, I got the same error.

Step 5: Upload documents and build knowledge graphs
Here, I was able to connect to the service and upload a file. However, with CPU inference I realized it was just going to take too long. So, I tried to use the other NVIDIA hosted models that were listed in the model pull down.

First time through it complained that I didn’t have a key:
app-1 | … Error: NVIDIA API key is required when using NVIDIA provider. Please set NVIDIA_API_KEY in your environment variables.

So, I decided to go ahead and use the NVIDIA_API_KEY I set up when I tried the RAG application in AI Workbench demo. Alas, the selected model was not found
app-1 | Error creating or testing Nemotron model: Error: Model test failed: 404 status code (no body) app-1 | app-1 | Troubleshooting URL: MODEL_NOT_FOUND | 🦜️🔗 Langchain

At this point, the wine glass was empty and I elected to call it a night.

Questions:

Has anyone successfully used this playbook to run on the GPU?
Is there something straightforward I should be trying? I was hoping to explore the txt2kg a bit to see if it would be useful to me but I’m not sitting with a mission critical need so if I have to wait for updates, so be it.

Thanks!

aniculescu · October 27, 2025, 11:23pm

Hi, this is a known issue with the Text 2 Graph playbook which we will fix

john_g · October 28, 2025, 2:06am

Thanks, aniculescu. Will keep an eye out for the update.

Neurfer · November 14, 2025, 6:51am

FIX: Change OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

I had the same issue, but testing ollama image by itself shows, it’s not the image because it is able to use GPU.

# Run Ollama in a docker byitself
$ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Test
$ docker exec ollama ollama run llama3.1:8b "test" && docker exec ollama ollama ps

NAME           ID              SIZE      PROCESSOR    CONTEXT    UNTIL               
llama3.1:8b    46e0c10c039e    5.2 GB    100% GPU     4096       29 minutes from now  

# Locate the CUDA library.  Those name of dirs are the correct vaule for the OLLAMA_LLM_LIBRARY env var.
$ docker exec -it ollama bash
root:/# ls -l /usr/lib/ollama/
total 1568
drwxr-xr-x 2 root root   4096 Nov 13 22:01 cuda_jetpack5
drwxr-xr-x 2 root root   4096 Nov 13 21:59 cuda_jetpack6
drwxr-xr-x 2 root root   4096 Nov 13 22:12 cuda_v12
drwxr-xr-x 2 root root   4096 Nov 13 22:09 cuda_v13
-rwxr-xr-x 1 root root 857808 Nov 13 21:55 libggml-base.so
-rwxr-xr-x 1 root root 725928 Nov 13 21:55 libggml-cpu.so

So I changed OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

# FIX: Change the line #61 in docker-compose.yml
    environment:
      - OLLAMA_LLM_LIBRARY=cuda_v13       # Use CUDA library 

$ ./start.sh

# Test
$ docker exec ollama-compose ollama run llama3.1:8b "test" && docker exec ollama-compose ollama ps

NAME           ID              SIZE      PROCESSOR    CONTEXT    UNTIL               
llama3.1:8b    xxxxxxxxxxxxx   5.2 GB    100% GPU     4096       xx minutes from now

Longer answer

OLLAMA_LLM_LIBRARY is declared as an env-config key and mentioned in the docs, but the dynamic loader that actually picks/loads runtime backends is driven by the ggml dynamic-backend loader and OLLAMA_LIBRARY_PATH (not by OLLAMA_LLM_LIBRARY alone). In other words, setting OLLAMA_LLM_LIBRARY=cuda by itself is not sufficient if the dynamic CUDA backend library is not present/compatible or if OLLAMA_LIBRARY_PATH / LD_LIBRARY_PATH / container GPU access is incorrect — in those cases the code will fall back to the CPU backend and you’ll see ~100% CPU usage.

What to check (quick checklist — run on the machine where you see 100% CPU)

Check which LLM libraries are present:
ls /usr/lib/ollama or ls $(dirname $(readlink -f $(which ollama)))/../lib/ollama — list files to see cuda_v13*.so / cuda_v12*.so / cpu*.so present.

PrinceHal · November 23, 2025, 5:42pm

Good Lord, 4 characters made all the difference. I’ve been tearing my hair out all yesterday over this.

Two days ago, I watched an Nvidia live stream DGX Spark Live: Process Text for GraphRAG With Up to 120B LLM where Nvidia employees Rishi Puri, Santosh Pavani and Prachi Goel demonstrated how to use this very repository. The specifically mention that they’re using gpt-oss-120b as the underlying model, served by Ollama.

Although they show results obtained from the pipeline at various points, they don’t show the LLM in action, but if it’s only running on the CPU that big model would crawl. So what did they do to make it work? Their presentation doesn’t mention any tweaks; they are just using the code from the repository.

As I said at the beginning, the change in the environment variable OLLAMA_LLM_LIBRARY from cuda to cuda_v13 resulted in a big speed up in token generation because it allowed the use of the GPU. How could the presenters not have known that this was necessary?

john_g · November 24, 2025, 3:41pm

Thanks, Neurfer! Been traveling and didn’t get back to my system until today. I will definitely make the change - really appreciate the update!

Topic		Replies	Views
Step 1 of Text to Knowledge Graph playbook has an error DGX Spark / GB10	6	361	November 23, 2025
Text to Knowledge Graph - Ollama issues DGX Spark / GB10	10	379	January 22, 2026
Txt2kg Playbook ./start.sh --complete does not start Additional Services (Complete Stack): DGX Spark / GB10	19	420	January 8, 2026
Txt2kg Knowledge Graph Triple Extraction is slow DGX Spark / GB10	5	124	February 11, 2026
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	8	2157	January 20, 2026
DGX Spark performance DGX Spark / GB10	50	4923	February 27, 2026
Nemotron-3-Super 120B on GB10 — llama.cpp sm_121 build + Ollama GGUF incompatibility fix DGX Spark / GB10 Projects llama , nemotron	3	834	March 22, 2026
Models not using Spark GPU? DGX Spark / GB10 containers	10	720	December 15, 2025
Spark-inference: Run 3 specialized models simultaneously on your DGX Spark — cybersecurity + coding + orchestration, 30-min setup DGX Spark / GB10 Projects jetson , llama , deepseek , nemotron	3	754	May 11, 2026
FabricManager will not run DGX Spark / GB10	13	150	April 8, 2026

DGX Spark txt2kg playbook discrepancies / CPU fallback questions

FIX: Change OLLAMA_LLM_LIBRARY from cuda to cuda_v13.

Longer answer

What to check (quick checklist — run on the machine where you see 100% CPU)

Related topics