Buyers beware: DGX Spark limited to 64GB in ComfyUI

ComfyUI is THE AI media generation tool that everyone uses, from professionals to hobbyists. AI labs usually only release their demos and proof of concepts for ComfyUI.

The Spark’s unusual architecture has caused a bad side effect in ComfyUI: every safetensor loaded uses double the memory. The safetensor is first loaded RAM, and then in “VRAM”, which on the Spark is also RAM. Result: double memory use.

This means that instead of the Spark being able to use all 128GB of memory for media generation, it is limited to 64GB, greatly crippling it as an AI workstation.

How to reproduce the issue by using a workflow that requires ~75GB VRAM:

  1. Install and open ComfyUI dgx-spark-playbooks/nvidia/comfy-ui at main · NVIDIA/dgx-spark-playbooks · GitHub
  2. In the top left, click the Menu button, click Browse Templates, search for ‘flux 2 dev’, and select the ‘Flux.2 Dev Text to Image’ template found
  3. Click Run to generate an image. If you are missing the safetensors, it will give you the URL to where to download them.
  4. using htop, monitor memory usage of the above operation

You will run out of memory even though you should be comfortably running this workflow and have 45GB left.

Relevant reading:

3 Likes

This happens on Strix Halo as well. It will be solved by kernel 6.17 version or ComfyUI folks fixing it. Just a matter of time.

@NVES @aniculescu

So a future Linux kernel update will fix this? Good to know.

I know the Spark uses Ubuntu 24.04 LTS, so I thought it would be stuck on vanilla kernel 6.8 forever, but I just checked and Spark has kernel 6.14-nvidia. So Nvidia is not worried about updating the kernel past vanilla Ubuntu.

However, looking at ‘apt changelog linux-image-$(uname -r)’, it seems it went:

  • Aug 2024 to Nov 2024: 6.11
  • Nov 2024 to Feb 2025: 6.12
  • Feb 2025 to Now: 6.14

It’s been a long time without kernel updates now.

Out of curiosity, in 4 months Ubuntu 26.04 LTS comes out. How fast does Nvidia usually follow? 26.04 would be a guaranteed migration to kernel 6.20.

I can confirm that the kernel will be updated to 6.17 in the next major update

6 Likes

@aniculescu Can you clarify that when we get the kernel update in the future, this double memory issue will be solved automatically? If so, I’ll be patient as necessary.

If the kernel update doesn’t fix the issue (and Rafael Amorim hasn’t linked any evidence), then I think this is something Nvidia’s devs should be collaborating on with the ComfyUI team to solve.

Nvidia’s customer support has answered my ticket by saying: “Regarding this issue, I would recommend you to please post your query at the developer forum where your queries will be answered by developers from Nvidia. This support line is limited to consumer end products (gaming GPU, GeForce now)”
What is the official way to get customer support for the DGX Spark? It has customer support, right?

We already know about these overheads with mmap for a couple months now:

1 Like

@aniculescu @NVES Please give this thread a 2nd look. The problem I’m reporting has nothing to do with what Raphael is talking about. I doubt a kernel update will fix this issue.

@raphael.amorim Is that what your kernel 6.17 comment was about? mmap?

mmap has nothing to do with the issue I’m reporting. I’m not complaining about slow model loading speed, I’m complaining that every image model in ComfyUI requires DOUBLE the memory to be loaded, PERMANENTLY.
Respectfully, I am getting the impression you are rushing to “solve” threads, but you seem to have a fundamental misunderstanding of what I post about in my threads. Your misunderstandings end up derailing the thread with something unrelated. You’re basically (unintentionally) spamming my thread.
In this case, you pinged busy Nvidia employees and gave them the wrong impression of what the problem is, and now they think this thread is just asking about a kernel update. You are not helping at all.

It appears that ComfyUI might be using mmap to pin the model into “CPU” memory before passing to GPU. But as you said, the unified memory architecture effectively halves your usable memory. The git issue you linked does show that using --disable-mmap or --cache-none options can help this issue.
The 6.17 kernel brings improvements to mmap performance which is why raphael.amorim brought it up, however, I am not yet certain it will fully fix this issue without changes from the ComfyUI team.

What is the official way to get customer support for the DGX Spark? It has customer support, right?

NVIDIA customer support will help with Spark hardware related issues, but software issues will be redirected to this forum

@starkrun Sorry for the derail earlier, I mixed up “mmap is slow” with the “permanent double memory footprint” you are reporting. It was not my intent to rush anything. When I tagged aniculescu and NVES, my goal was not to ask for the thread to be marked as resolved, but simply to bring the issue to their attention and hopefully get more clarification. I’m just a fellow forum user trying to help others on my free time :D

On the ComfyUI side, the evidence in ComfyUI issue #10896 points to the safetensors loading path effectively creating two resident copies on DGX Spark. The model is first loaded into host memory and then transferred with a forced copy to the device, but on Spark there is no separate VRAM. CPU and GPU share the same unified physical memory pool, so this pattern results in two large allocations backed by the same 128 GB. That is why you hit a practical limit around ~64 GB even though the system has enough total memory.

That same issue report shows a concrete workaround: running with --disable-mmap and changing the loader behavior so it does not force an extra copy during tensor.to(device=...) (switching from copy=True to copy=False). With this change, the memory ballooning stops because the model is no longer duplicated during the transfer. They also report that GGUF variants do not show the same doubling behavior in their tests, which further suggests the issue is specific to the safetensors loader path rather than a fundamental hardware limit.

Kernel 6.17 can still matter, but mostly in terms of mmap performance, page fault behavior, and overall load times. It does not automatically fix the “two permanent copies” issue by itself, since that behavior is driven by userspace loader logic rather than the kernel.

The reason I brought up GPUDirect RDMA is not because RDMA causes this ComfyUI issue, but because it is a similar class of problem where common assumptions about memory models do not hold on Spark. In the GPUDirect RDMA thread, nvidia-peermem is effectively non-functional on Spark, failing initialization with -EINVAL (-22) and lacking the usual dependencies like nvidia and ib_uverbs. This highlights how features that rely on pinning, peer memory mapping, or treating GPU memory as a distinct address space need platform-specific handling on Spark. ComfyUI’s safetensors loader is making a comparable assumption (CPU copy followed by a GPU copy), and on a unified-memory system that turns into a real and visible capacity loss unless the application adapts.

The most likely real fix is a ComfyUI-side change (or an official Spark-aware loader path). Kernel updates are helpful, but they are not a guaranteed one-shot fix for the permanent memory doubling you’re seeing.

1 Like

Yes, I addressed this problem with a custom loader node that uses fastsafetensors. It loads the model directly from storage to “VRAM” using GPUDirect. This is a workaround for now until Nvidia sort out the mmap thing. GitHub link:

See the details here:

Just to follow up here. Yes, this is absolutely, 100% an issue with Hugging Face’s safetensors library. It implements mmap during load, which doesn’t work with the Spark for the reasons you pointed out. This is is a fine solution on a system with a dGPU, but on a shared memory system it’s not a good approach. The correct approach (at least for us using CUDA) is to use GPUDirect and implement the libtorch/Python glue so Comfy can use it. fastsafetensors does this to an extent, but there are memory problems that crop up by bypassing Comfy’s memory management. The true solution is to have the Comfy folks integrate this into ComfyUI as a native option as it also works with dGPUs, not just shared memory systems.

Or, do the kernel-level fix, I guess. I don’t really know how that works but GPUDirect already works now so I’m not sure what advantage it offers other than to fix something that really isn’t the best approach anyway.