Please fix the official nvidia/vllm docker container

Is it possible to update the official nvidia/vllm Docker container? The latest image appears to be pinned to vLLM v0.11, which is quite outdated. I opened an issue on the vLLM GitHub repo but didn’t get a response, so I’m posting here in hopes someone from NVIDIA sees this and can help get it fixed. Thanks!

The NVIDIA containers for vLLM are updated on a monthly basis. Sometimes with minor updates in between.

But you won’t see v0.14.0 anytime soon. NVIDIA is more interested in stability as latest features as it seems. And I assume each new release of vLLM will be thoroughly tested as it needs not only to run on a GB10, but across all relevant GPUs, especially on their big irons.

You could go with the official vLLM images, but the best way to unlock the full potential of your Spark(s) is currently the solution provided by eugr as he also tries to incorporate changes / patches before they hit the official vLLM build.

As you can see here:

If you build these containers with the --use-wheels switch, it saves time (quite fast) and nerves. And it allows you also to use latest transformers if needed like in this example for a bleeding edge model like GLM 4.7 Flash. Something that is not yet available with the current vLLM release.