USE vllm+ray (8 dgx spark)run DeepSeek-V3-0324-FP4 ERROR

When setting up DeepSeek-V3-0324-FP4, the GPU utilization was set to 0.7 during startup. It used 80GB of memory, with 48GB remaining on a single device, and there are 8 devices in total. Checking via free -m shows that the system still has available memory.
The system error is as follows:

五 10月 31 16:45:22 2025] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
[五 10月 31 16:45:22 2025] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359

  • How are you running this across 8 Sparks?
  • Not fully sure on the memory requirements of this model. Are you seeing this memory usage on all 8 of them?

Also make sure to reference our playbook on how to use vLLM across multiple sparks: Try NVIDIA NIM APIs

Yes, this issue is occurring on all devices. Both system memory and GPU VRAM are sufficient, and these devices run Llama 3.1 401B without any issues. Therefore, I believe this should not be a problem with memory or VRAM; it might instead be related to driver compatibility or another similar issue.

As for NIM, I am still learning it at the moment. Could you please advise if there are any temporary methods I can use to resolve this issue given the current situation?

use vllm +ray

Sorry, I meant to ask how are your Sparks connected? We don’t support linking a Spark to multiple machines so I assume you do not have them physically linked.
If you could also provide more extensive logs of your workload that would be helpful,

As for NIM, I am still learning it at the moment.

The link I mentioned above says “NIM” but it is actually a playbook on how to serve models using vLLM on Spark and I would highly recommend you review it.
Linked here again: https://build.nvidia.com/spark/vllm/stacked-sparks