Currently, the available NVIDIA vLLM container for Jetson AGX Thor is based on vLLM 0.11.x, which does not support speculative decoding for vision-based models, particularly for the Qwen family (EAGLE-3).
However, vLLM 0.12.x introduces support for speculative decoding for vision models, which is required for our use case.
Could you please clarify:
-
Is there an official NVIDIA container release planned for Jetson AGX Thor that includes vLLM 0.12.x?
-
If so, what is the expected timeline for this release?
-
In the meantime, what is the recommended and supported approach to upgrade from vLLM 0.11.x to vLLM 0.12.x on Jetson AGX Thor?