Hey, anyone experiencing this? Did the issue resurfaced? According to several internet posts this issue was supposed to fixed, and merged back in July 2025. I’m using eugr latest community repo. [image] vLLM Hihg CPU usage when doin nothing General …

Confirmed to be present on my 2-node cluster with Asus Ascent: vLLM v0.17.0rc1.dev168+gdc6b57846.d20260309 Figuring out how to set consult https://docs.vllm.ai/en/stable/features/sleep_mode/?h=sleep+mode In summary: From shell (just in case set it for each Spark node): VLLM_SLEEP_WHEN_IDLE=1 Or …

JFYI, you can set it through -e parameter, it will propagate to all nodes: ./launch-cluster.sh -e VLLM_SLEEP_WHEN_IDLE=1

Won’t this evict a lot of stuff from memory, resulting in a slower next request? Sleep Mode - vLLM vLLM’s Sleep Mode allows you to temporarily release most GPU memory used by a model, including model weights and KV cache

No, this variable doesn’t trigger sleep mode, it just lets the main loop to go to sleep if there are no requests instead of keeping the core active to reduce latency. Although last time I tried it, it didn’t affect 100% idle core utilization when using Ray, but maybe they finally fixed it.

Oh sorry, I searched for VLLM_SLEEP_WHEN_IDLE and the top result was that page in the docs, even though it doesn’t seem to actually mention this env var. I blame DuckDuckGo 😄 I couldn’t actually find much onVLLM_SLEEP_WHEN_IDLE, even using their own search. The only mention was here: Net…

[image] DannyTup: VLLM_SLEEP_WHEN_IDLE Actually, it doesn’t seem to be in use in vLLM anymore - I can’t find it in the codebase.

[image] eugr: Actually, it doesn’t seem to be in use in vLLM anymore - I can’t find it in the codebase. Yeah, looks like this change solved the issue so it’s no longer necessary? [Core] Remove busy loop from idle buffer readers (#28053) …

I feel like I have the opposite problem. When vLLM is outputting tokens, it uses only 1, maybe 2 CPUs to 100% out of the 20 cores. Is it my config?

Vllm 100% CPU usage when idle - again?

Accelerated Computing DGX Spark / GB10 User Forum DGX Spark / GB10

kim.dang March 10, 2026, 7:40pm 11

Not much having much luck. Tried all the suggestions here setting VLLM_SLEEP_WHEN_IDLE=1, still see two of the processes on each spark stuck at 100%, this is after leaving after 10mins and I even closed the open webui window.

Topic		Replies	Views
Abnormal CPU usage with long-running VLLM docker DGX Spark / GB10	7	228	March 18, 2026
With two Sparks, vLLM 0.18.1rc0 still hammering two cores at 100% when idle DGX Spark / GB10	7	285	March 28, 2026
Who wants to be the hero and help a total newbie! Got a spark and um, yeah DGX Spark / GB10 nemotron	7	612	April 3, 2026
Vllm on spark cluster starts and loads model but API not running? DGX Spark / GB10	9	889	December 1, 2025
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	5505	December 9, 2025
Llama.cpp rpc on dgx spark DGX Spark / GB10 llama	4	451	March 1, 2026
Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs DGX Spark / GB10 llama , nemotron	37	2710	February 25, 2026
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	1676	February 13, 2026
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	2574	December 25, 2025
GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...? DGX Spark / GB10	89	4529	February 13, 2026

Vllm 100% CPU usage when idle - again?

Related topics