USE vllm+ray (8 dgx spark)run DeepSeek-V3-0324-FP4 ERROR

When setting up DeepSeek-V3-0324-FP4, the GPU utilization was set to 0.7 during startup. It used 80GB of memory, with 48GB remaining on a single device, and there are 8 devices in total. Checking via free -m shows that the system still has available memory.
The system error is as follows:

五 10月 31 16:45:22 2025] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
[五 10月 31 16:45:22 2025] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359

  • How are you running this across 8 Sparks?
  • Not fully sure on the memory requirements of this model. Are you seeing this memory usage on all 8 of them?

Also make sure to reference our playbook on how to use vLLM across multiple sparks: Try NVIDIA NIM APIs