I'd like to learn how to use the latest vLLM on DGX Spark

The nvcr.io/nvidia/vllm Docker image fails to load the Qwen3-Next-80B-A3B-Instruct-FP8 model on a DGX Spark machine. Installing vLLM from the GitHub repository results in an AssertionError: VLLM_USE_PRECOMPILED is only supported for CUDA builds.

(zh) qichen@spark-2b79:~/zh/vllm$ VLLM_USE_PRECOMPILED=1 uv pip install --editable .
Using Python 3.12.12 environment at: /home/qichen/miniconda3/envs/zh
  × Failed to build `vllm @ file:///home/qichen/zh/vllm`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta.build_editable` failed (exit status: 1)

      [stderr]
      /home/qichen/.cache/uv/builds-v0/.tmpYuxtb2/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at
      /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      Traceback (most recent call last):
        File "<string>", line 14, in <module>
        File "/home/qichen/.cache/uv/builds-v0/.tmpYuxtb2/lib/python3.12/site-packages/setuptools/build_meta.py", line 473, in get_requires_for_build_editable
          return self.get_requires_for_build_wheel(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/qichen/.cache/uv/builds-v0/.tmpYuxtb2/lib/python3.12/site-packages/setuptools/build_meta.py", line 331, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/qichen/.cache/uv/builds-v0/.tmpYuxtb2/lib/python3.12/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
          self.run_setup()
        File "/home/qichen/.cache/uv/builds-v0/.tmpYuxtb2/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup
          exec(code, locals())
        File "<string>", line 661, in <module>
      AssertionError: VLLM_USE_PRECOMPILED is only supported for CUDA builds

      hint: This usually indicates a problem with the package or the build environment.
(zh) qichen@spark-2b79:~/zh/vllm$ python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')"
CUDA available: True
CUDA version: 12.8

When I use uv pip install vllm, I encounter compilation errors. I’d like to ask if this is due to the Spark system? Can I install the base version of Ubuntu on Spark?

      [stderr]
      /home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at
      /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      /home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools_scm/_integration/version_inference.py:51: UserWarning: version of None already set
        warnings.warn(self.message)
      listing git files failed - pretending there aren't any
      CMake Error at CMakeLists.txt:14 (project):
        Running

         '/home/qichen/.cache/uv/builds-v0/.tmpjVsjf2/bin/ninja' '--version'

        failed with:

         no such file or directory


      Traceback (most recent call last):
        File "<string>", line 11, in <module>
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/build_meta.py", line 432, in build_wheel
          return _build(['bdist_wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/build_meta.py", line 423, in _build
          return self._build_with_temp_dir(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
          self.run_setup()
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup
          exec(code, locals())
        File "<string>", line 673, in <module>
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 186, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
          dist.run_commands()
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
          self.run_command(cmd)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/dist.py", line 1104, in run_command
          super().run_command(command)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
          cmd_obj.run()
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/command/bdist_wheel.py", line 370, in run
          self.run_command("build")
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
          self.distribution.run_command(command)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/dist.py", line 1104, in run_command
          super().run_command(command)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
          cmd_obj.run()
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
          self.distribution.run_command(command)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/dist.py", line 1104, in run_command
          super().run_command(command)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
          cmd_obj.run()
        File "<string>", line 270, in run
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 99, in run
          _build_ext.run(self)
        File "/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
          self.build_extensions()
        File "<string>", line 232, in build_extensions
        File "<string>", line 210, in configure
        File "/home/qichen/miniconda3/envs/zh/lib/python3.12/subprocess.py", line 413, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/home/qichen/.cache/uv/sdists-v9/index/e367fd55faf540ee/vllm/0.10.1.1/yK_7JcUAK-RK1qQ5Mssg4/src', '-G',
      'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/bin/python',
      '-DVLLM_PYTHON_PATH=/home/qichen/miniconda3/envs/zh/lib/python312.zip:/home/qichen/miniconda3/envs/zh/lib/python3.12:/home/qichen/miniconda3/envs/zh/lib/python3.12/lib-dynload:/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages:/home/qichen/.cache/uv/builds-v0/.tmpJYCmuM/lib/python3.12/site-packages/setuptools/_vendor',
      '-DFETCHCONTENT_BASE_DIR=/home/qichen/.cache/uv/sdists-v9/index/e367fd55faf540ee/vllm/0.10.1.1/yK_7JcUAK-RK1qQ5Mssg4/src/.deps', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile',
      '-DCMAKE_JOB_POOLS:STRING=compile=20', '-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc']' returned non-zero exit status 1.

      hint: This usually indicates a problem with the package or the build environment.

Please check Run VLLM in Spark - #21 by johnny_nv

1 Like

As far as I can tell, the official vLLM does not support the DGX Spark. I’ve made an issue requesting this to be added.

In the meantime, Nvidia have release their own vLLM container that runs on vLLM v0.10.2

Hi, please check out our playbook on how to run vLLM on Spark

This container is outdated and can’t run Qwen3-Next model that OP wants to run. Better to follow the other thread on compiling vllm from source.

1 Like

By the way, I was able to run nvidia/Nemotron-Nano-12B-v2-VL-BF16 on the spark using vLLM

1 Like

I have already referred to this tutorial and this container, but they cannot run the new model architecture. Both this container and the sglang container (lmsysorg/sglang:spark) encounter the issue of the Triton kernel not supporting models with fp8e4nv precision.

Hello, I’d like to ask: Are you using the Docker vllm image or building from source code?

Building from source.

Did you only have to build the VLLM from source or Pytorch with CUDA, also because it seems like aarm64 compatible pytorch isnt available.