Hi everyone. I’m new here and just picked up a GB10 Spark clone. I’ve been having a great time with it so far, but I ran into a snag. I wanted to try the new QWEN 3.6 models on the latest vLLM release, then realized it requires NVIDIA Driver 595.58 or newer.
My system reports driver version 580.142 through nvidia-smi, and I don’t see any updates available through the DGX OS interface.
So I’m wondering what the right move is. Can we install newer drivers the same way we would on Ubuntu, or should we wait for an official DGX OS update? The hardware and OS are based on Ubuntu, but they are clearly not the same thing, so I’m not sure what is safe to do.
Where do you see this driver requirement? Also, in case you haven’t seen this yet, these repos make it very easy to use vllm for Spark and Spark clusters.
Use eugr’s repo as suggested above, I haven’t myself checked that vLLM 26.04 release but Eugr keeps basically everything in the bleeding edge. He’s repo is the way to go.
Good news. It works just fine thanks to CUDA Compatibility. I just tested it.
>docker run --gpus all -it --rm nvcr.io/nvidia/vllm:26.04-py3
==========
== vLLM ==
==========
NVIDIA Release 26.04 (build 299333414)
vLLM Version 0.19.0+6bc3197f
Container image Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
and the Product-Specific Terms for NVIDIA AI Products
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 13.2 driver version 595.58.03 with kernel driver version 580.142.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for vLLM. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
>root@1a57b9d56843:/workspace# vllm serve Qwen/Qwen3-8B
...
(APIServer pid=142) INFO: Started server process [142]
(APIServer pid=142) INFO: Waiting for application startup.
(APIServer pid=142) INFO: Application startup complete.
Hello. I am testing the 3.6-35B-A3B-FP8 since the last couple of days with a modified recipe in spark-vllm-docker. The speed is good but I face a strange problem with the tool calls reliability. Often the model type the absolute path of the project wrong after several successful tool calls on the same path. I tried to adjust some settings in the recipe but unfortunately the problem persists. I am using the qwen3.5 recipe as a base for the 3.6 modification. My daily driver is currently the qwen3-coder-next-fp8 and it is running stable with similar recipe settings. Have you faced similar problems with 3.6-35B-A3B-FP8 like the described above?
Check this post Bfloat16 Quality = Speed? from @whpthomas . He has a section "Qwen3.5 Tool Call Fix" on the first post. Follow that to the letter for (I know it says Qwen3.5, but works perfectly) and add it to your recipe.
That should help fixing or completely fix tool calling.