@dbsci I don’t know if you’ve test this image on dgx cluster, but the biggest issue for me is nccl version (tbh, i don’t know how to setup nccl correctly), but here is the version
"com.nvidia.nccl.version": "2.29.stable.20260109",
The previous build nccl is “2.29.3” (which works for me so far)
I don’t know what is the change? can you confirm? Thanks.
My previous builds were on a custom pytorch 2.10 base image that I built to work around a bunch of limitations at the time. However, that’s old now – I wanted to use at least pytorch 2.11 release version – so I was trying to use NVIDIA’s pytorch as a base and do sglang on top – but that didn’t turn out well.
So yeah, I previously built NCCL. I’m going to end up reworking sglang to use same NCCL as eugr’s vllm to also include mesh topology support, but I haven’t done it yet.
Still experimental, but I’ve released an updated version:
scitrera/dgx-spark-sglang:0.5.10
- upstream sglang v0.5.10
- pytorch 2.11
- patched NCCL to support mesh topology (same version being used in @eugr’s spark-vllm-docker)
- flashinfer v0.6.7.post3
- transformers v5.5.3 (EDIT: updated to 5.5.3 to catch latest gemma4 patches)
Still experimental (still), but I’ve released an updated version:
scitrera/dgx-spark-sglang:0.5.11
- upstream sglang v0.5.11
- pytorch 2.11
- patched NCCL to support mesh topology (same version being used in @eugr’s spark-vllm-docker)
- flashinfer v0.6.10
- transformers v5.8.0
Thank you for this, I know that the main struggle is making sglang work on the nvidia spark machines, but, did you find cases where sglang works faster/better than vllm on these devices?
Thank you for the good work! It runs well for me. I am going to use it the replace the original 0.5.9 (dev2).
Unfortunately the answer to that is constantly changing. Historically, I’ve found that prefix caching performance was better with sglang, but I haven’t properly benchmarked any direct comparisons lately. Also historically, sglang was further along with MTP support, but I feel like that’s been changing lately. I’ve basically found that I end up bouncing between sglang and vllm depending on the model and situation.
Semi-related: sparkrun technically has capacity for generating commands (not requiring user to provide it as part of the recipe). That hasn’t gotten much use and fell out of date, so I probably need to do some updating there, but I bring it up because some of the original idea with sparkrun was that you could not include the “command” in the recipe and then switching between vllm and sglang was easier because sparkrun would translate the CLI args for you, so you could just give a CLI option to switch to sglang or vllm and the rest of the setup could be preserved to allow for quick attempts back and forth. Unfortunately, since nobody uses that functionality really, it probably need some updating, but that can be addressed soon if there is actually demand.
You’re welcome. I’m glad you finally have a worthy replacement for 0.5.9dev2. That’s relatively old at this point!