GDX Spark is extremely slow on a short LLM test

cosinus · November 10, 2025, 5:30pm

llama.cpp comes with a builtin chat interface that is quite capable.

As for Open WebUI you can add a new connection for llama.cpp as it is OpenAI API compatible.

Depending on the port you choose for llama.cpp (I think 8000 is the default) you just need to add this to your URL. When Open WebUI is also running inside a container you will need to exchange the localhost with the IP address of your primary network interface of the Spark.

So http://192.168.0.123:8000/v1 (example) would be an URL you would have to enter in your connection.

See llama.cpp/docs/docker.md at master · ggml-org/llama.cpp · GitHub for using docker images. For example:

docker run --gpus all -p 8000:8000 -v $HOME/models:/models ghcr.io/ggml-org/llama.cpp:full-cuda -s -hf ggml-org/gpt-oss-120b-GGUF --port 8000 --host 0.0.0.0 -c 0 --jinja

-hf for downloading a model directly from Hugging Face – in this case ggml-org/gpt-oss-120b-GGUF · Hugging Face

There also helper to ease the use of llama.cpp like llama swap - not sure if there are ready to use arm64 images for that already.

Still waiting for ASUS Germany to perform… so I can’t test it myself yet.

Topic		Replies	Views
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	8	1339	January 20, 2026
DGX Spark performance DGX Spark / GB10	49	2569	February 13, 2026
Models not using Spark GPU? DGX Spark / GB10 containers	10	408	December 15, 2025
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	10	2237	January 25, 2026
Tutorial: Build llama.cpp from source and run Qwen3 235B DGX Spark / GB10 Projects llama	28	3994	January 20, 2026
Dgx spark benchmark performance DGX Spark / GB10	17	1671	January 4, 2026
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	24	1502	January 11, 2026
Introducing the Spark Arena DGX Spark / GB10	76	1475	February 24, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	1826	December 31, 2025
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1167	February 13, 2026

GDX Spark is extremely slow on a short LLM test

Related topics