Apparently mmap is still slow on DGX Spark on Linux 6.17?

coder543 · February 13, 2026, 2:12pm

I had codex benchmark with and without --no-mmap on llama-server, and the results weren’t great:

I benchmarked startup readiness for a step-3.5-flash llama-server setup by timing from process launch to the first
successful response from the Chat Completions API.

Method:

Launch llama-server with the same model/config in both cases.

Send repeated “hello world” requests to /v1/chat/completions with max_tokens: 1.

Record elapsed time until the first successful generated response.

Only variable changed: --no-mmap enabled vs disabled.

Results (single run each):

With --no-mmap: 16.100s

Without --no-mmap: 108.038s

Difference:

–no-mmap improved startup-to-first-response by 91.938s

About 6.7x faster readiness

Note:

After startup, per-request latency for tiny requests was similar in both cases. The major gain was initial server
readiness time.

I thought the new kernel was supposed to make mmap usable? This is the only system I’ve ever used where I can remember mmap being this slow… I don’t understand why it is so slow.

EDIT: more runs

Re-ran it 3x per variant (6 total), measuring launch-to-first-successful-chat-response for hello world, max_tokens: 1.

Per-run results:

–no-mmap: 15.101s, 14.901s, 15.960s

–mmap: 95.430s, 96.556s, 102.464s

Summary:

–no-mmap: min 14.901s, median 15.101s, mean 15.321s, max 15.960s

–mmap: min 95.430s, median 96.556s, mean 98.150s, max 102.464s

Delta (means):

–no-mmap faster by 82.829s

About 6.4x faster startup-to-first-response

eugr · February 13, 2026, 5:52pm

Yes, 6.17 doesn’t fix slow mmap, unfortunately. It does speed up --no-mmap though.

Topic		Replies	Views
Very slow mmap on DGX Spark that affects model loading - questions to NVIDIA DGX Spark / GB10 llama	27	1619	December 26, 2025
Llama.cpp experimental native mxfp4 support for blackwell PR DGX Spark / GB10 llama	13	866	January 7, 2026
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10 deepseek	21	2393	January 25, 2026
DGX Spark vs AMD Strix Halo DGX Spark / GB10 llama	2	3484	October 23, 2025
Dgx spark benchmark performance DGX Spark / GB10	17	1602	January 4, 2026
Mmap shared memory / cuda calculations are slow Jetson AGX Xavier cuda	4	648	September 28, 2022
Llama.cpp GLM 4.7 Flash Benchmark DGX Spark / GB10 llama	0	183	January 26, 2026
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	8	1222	January 20, 2026
Mmap() is slower than malloc() when use cudaMemcpy() CUDA Programming and Performance	7	1208	May 3, 2022
Help on llama.cpp command line arguments and compilation settings (performance testing included) DGX Spark / GB10 performance , generative_ai , llama , nemotron	7	407	January 9, 2026

Apparently mmap is still slow on DGX Spark on Linux 6.17?

Related topics