Apparently mmap is still slow on DGX Spark on Linux 6.17?

I had codex benchmark with and without --no-mmap on llama-server, and the results weren’t great:

I benchmarked startup readiness for a step-3.5-flash llama-server setup by timing from process launch to the first
successful response from the Chat Completions API.

Method:

  • Launch llama-server with the same model/config in both cases.
  • Send repeated “hello world” requests to /v1/chat/completions with max_tokens: 1.
  • Record elapsed time until the first successful generated response.
  • Only variable changed: --no-mmap enabled vs disabled.

Results (single run each):

  • With --no-mmap: 16.100s
  • Without --no-mmap: 108.038s

Difference:

  • –no-mmap improved startup-to-first-response by 91.938s
  • About 6.7x faster readiness

Note:

  • After startup, per-request latency for tiny requests was similar in both cases. The major gain was initial server
    readiness time.

I thought the new kernel was supposed to make mmap usable? This is the only system I’ve ever used where I can remember mmap being this slow… I don’t understand why it is so slow.

EDIT: more runs

Re-ran it 3x per variant (6 total), measuring launch-to-first-successful-chat-response for hello world, max_tokens: 1.

Per-run results:

  • –no-mmap: 15.101s, 14.901s, 15.960s
  • –mmap: 95.430s, 96.556s, 102.464s

Summary:

  • –no-mmap: min 14.901s, median 15.101s, mean 15.321s, max 15.960s
  • –mmap: min 95.430s, median 96.556s, mean 98.150s, max 102.464s

Delta (means):

  • –no-mmap faster by 82.829s
  • About 6.4x faster startup-to-first-response

Yes, 6.17 doesn’t fix slow mmap, unfortunately. It does speed up --no-mmap though.