Now running 2x DGX Spark stacked over QSFP56 looking for model recs for agentic workloads (Hermes / OpenClaw)

wait how ? on how many sparks lol minimax took majority of my space

So a caveat; I disabled all GUI related things as well as the built in Nvidia dashboard. This frees up enough memory for me to get vLLM to load with memory utilization >90%

I built my own lightweight orchestration manager so this won’t be exactly one to one but the equivalent launch commands with the community scripts are something like:

./launch-cluster.sh --name qwen3.6-35b-a3b -d
-t vllm-node-tf5
–apply-mod mods/fix-qwen3.5-chat-template
–no-ray
exec vllm serve cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit
–served-model-name qwen/qwen3.6-35b-a3b
–tool-call-parser qwen3_coder
–reasoning-parser qwen3
–enable-auto-tool-choice
–enable-prefix-caching
–trust-remote-code
–tensor-parallel-size 2
–gpu-memory-utilization 0.155
–max-model-len 262144
–max_num_batched_tokens 16384
–max_num_seqs 8
–chat-template unsloth.jinja
–host 0.0.0.0
–port 8000

./launch-cluster.sh --name minimax-m2.7 -d
-t vllm-node
–no-ray
exec vllm serve cyankiwi/MiniMax-M2.7-AWQ-4bit
–served-model-name minimax/minimax-m2.7
–tool-call-parser minimax_m2
–reasoning-parser minimax_m2
–enable-auto-tool-choice
–enable-prefix-caching
–trust-remote-code
–tensor-parallel-size 2
–gpu-memory-utilization 0.74
–max-model-len 196608
–max_num_batched_tokens 16384
–max_num_seqs 8
–host 0.0.0.0
–port 8000

So I’m using cyankiwi/MiniMax-M2.7-AWQ-4bit straight from the recipe on 2X sparks and having a horrible time - it hallucinates like crazy, adding i and j on the ends of variables and constantly either adds extra underscores or drops letters from path names - the same letter over and over ‘shared_native’ becomes ‘shared_ative’ becomes ‘shared__tive’ (double underscore more dropped letters). Asked it to review some C++ code and it gave me 8 vulnerabilities (there were none) all with variables that didn’t exist - similar to ones that did exist. Each box runs qwen3.6 fine on it’s own, and work fine the networking tests look good ~17GB/s is there something that goes wrong with the default recipes like this?

Sounds like you’re using 0.19.x. I don’t remember the exact builds (check the other thread), but <=0.18.1 works and supposedly >=0.20.x.

Thank you for the pointer! The 20.x release candidate works much better!