EDIT: looks like this is resolved by a new prebuilt wheel .dev176 so the original issue doesn’t need any further changes, just clearing wheel cache so new nightly is pulled.
Hello!
I’ve been trying to get Nemotron-3-Super running on a dual dgx spark setup and wanted to share what I found since I saw several other posts / comments about the same issue.
Note: I used Claude to help generate the rest of the message with the information that fixed running nemotron-3-super with TP=2 on my dual DGX spark stack.
**TL;DR**: The `_ZN3c1013MessageLoggerC1E` crash that’s been hitting people building with `spark-vllm-docker` is a cu130/cu132 mismatch in the Dockerfile. Two-line fix, PR submitted, and Nemotron Super NVFP4 is now serving at 24 tok/s via vLLM TP=2.
The Setup
- 2x DGX Spark GB10, ConnectX-7 direct connect (200Gbps)
- `spark-vllm-docker` (eugr’s repo) with the `nemotron-3-super-nvfp4` recipe
- vLLM 0.18.1rc1 from the prebuilt wheels
What Was Broken
Building `vllm-node` from the latest main branch and running any recipe gives:
ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib
I saw eugr had been trying a few things (cuda 13.2 torch, revert, etc.) and figured I’d dig in to see if I could help since I really wanted the `–moe-backend cutlass` support for Nemotron Super’s LatentMoE architecture.
Root Cause
Demangled the symbol: `c10::MessageLogger::MessageLogger(char const*, int, int, bool)`. That’s a PyTorch core library constructor.
The prebuilt vLLM wheel filename tells you everything: `vllm-0.18.1rc1.dev121+gcd7643015.d20260325.**cu132**`
But the Dockerfile installs PyTorch from:
--index-url /whl/nightly/**cu130**
cu132 wheel + cu130 PyTorch = different `libc10.so` ABI = symbol not found.
The Fix
Change `cu130` → `cu132` on lines 48 and 259 of the Dockerfile. The one catch is that `torchvision` and `torchaudio` don’t publish cu132 aarch64 nightlies, so you have to split the install:
# torch from cu132 (must match the prebuilt vLLM wheel)
uv pip install --prerelease=allow torch --index-url /whl/nightly/cu132 && \
# torchvision/torchaudio: try cu132, fall back to cu130
uv pip install --prerelease=allow torchvision torchaudio triton \
–index-url /whl/nightly/cu132 \
–extra-index-url /whl/nightly/cu130
Results
| Ollama (1 Spark) | vLLM NVFP4 TP=2 (2 Sparks) | |
|---|---|---|
| Quantization | Q4_K_M GGUF | NVFP4 (modelopt_mixed) |
| Generation | 18 tok/s | **24 tok/s** |
| Context | 256K | 262K (1M native) |
| Tool calling | Ollama API | OpenAI API + `–enable-auto-tool-choice` |
The NVFP4 quality is noticeably better than Q4_K_M too — getting cleaner code output with proper docstrings and fewer hallucinations.
Workflow That Helped
For anyone with 2 Sparks — I downloaded the HuggingFace NVFP4 weights (~75 GB) on spark1 only, then copied to spark2 over the CX7 link. rsync did 75 GB in about 2 minutes vs 90+ minutes from HuggingFace on each node. Way better than downloading on both in parallel.
Hope this helps someone else get unstuck. Happy to answer questions about the setup.