Not much having much luck. Tried all the suggestions here setting VLLM_SLEEP_WHEN_IDLE=1, still see two of the processes on each spark stuck at 100%, this is after leaving after 10mins and I even closed the open webui window.
kim.dang
11
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Abnormal CPU usage with long-running VLLM docker | 7 | 228 | March 18, 2026 | |
| With two Sparks, vLLM 0.18.1rc0 still hammering two cores at 100% when idle | 7 | 285 | March 28, 2026 | |
| Who wants to be the hero and help a total newbie! Got a spark and um, yeah | 7 | 612 | April 3, 2026 | |
| Vllm on spark cluster starts and loads model but API not running? | 9 | 889 | December 1, 2025 | |
| Install and Use vLLM for Inference on two Sparks does not work | 159 | 5505 | December 9, 2025 | |
| Llama.cpp rpc on dgx spark | 4 | 451 | March 1, 2026 | |
| Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs | 37 | 2710 | February 25, 2026 | |
| Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever | 36 | 1676 | February 13, 2026 | |
| Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) | 18 | 2574 | December 25, 2025 | |
| GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...? | 89 | 4529 | February 13, 2026 |