How to build a specific vLLM version (0.11.1) on Jetson Orin AGX (CUDA 12.6 / JetPack 6.2 r36.4.3)?

ckdavid233 · January 27, 2026, 2:50am

Hi NVIDIA team,

I am using a Jetson Orin AGX 64GB with the following environment:

Device: Jetson Orin AGX 64GB
JetPack: 6.2 (L4T r36.4.3)
CUDA: 12.6
OS: Ubuntu 22.04 (aarch64)

I am trying to run the DeepSeek-OCR project on this device, which relies on vLLM.
However, I am having difficulty finding a compatible vLLM version for this platform.

Here is what I have tried so far:

vLLM prebuilt wheels from https://pypi.jetson-ai-lab.io/

I previously downloaded vLLM 0.8.5 (which matches the DeepSeek-OCR test environment),
but I encounter runtime errors when running the OCR pipeline.
Unfortunately, newer versions such as vLLM 0.11.x are not available as prebuilt wheels
for JetPack 6.2 / CUDA 12.6.

NVIDIA vLLM container images

I tested the official image:

Package vllm · GitHub

From the available tags, I found that:

vLLM 0.14.0 (r36.4-tegra-aarch64-cu126) can start successfully,
and the model server launches correctly.

However, when integrating with DeepSeek-OCR, I encounter the following error:

ImportError: cannot import name ‘SamplingMetadata’ from ‘vllm.model_executor’

This appears to be a breaking API change between vLLM 0.11.x and 0.14.x,
and DeepSeek-OCR currently depends on the older interface.

My main question is:

Is there an official or recommended way to build a specific vLLM version
(e.g. vLLM 0.11.1) from source on Jetson Orin AGX with CUDA 12.6?

Specifically, I would like to know:

Whether vLLM 0.11.x is expected to be buildable on JetPack 6 (r36.4)
Any patches, build flags, or CMake options required for aarch64 / Jetson
Whether NVIDIA plans to provide additional prebuilt vLLM wheels or images
for intermediate versions like 0.11.x

Any guidance would be greatly appreciated.
Thank you very much for your support.

Best regards

carolyuu · January 27, 2026, 3:04am

*** Please note that this reply is generated by LLM automatically ***

The question is about building a specific version of vLLM (version 0.11.1) from source on a Jetson Orin AGX 64GB device with CUDA 12.6. The user is experiencing difficulties finding a compatible prebuilt vLLM version for their platform and is seeking guidance on building vLLM from source.

To address this question, we need to consider the following points:

vLLM version compatibility: The user is trying to run the DeepSeekOCR project, which relies on vLLM 0.8.5. However, this version is not compatible with the Jetson Orin AGX 64GB device. The user has tried newer versions of vLLM, such as 0.11.x, but they are not available as prebuilt wheels for JetPack 6.2 and CUDA 12.6.
Building vLLM from source: The user is asking whether it is possible to build vLLM 0.11.1 from source on the Jetson Orin AGX 64GB device. This requires considering the build dependencies, patches, and CMake options required for aarch64 architecture.
Official support and recommendations: The user is seeking guidance on whether NVIDIA plans to provide additional prebuilt vLLM wheels or images for intermediate versions like 0.11.x.

To answer these questions, we can provide the following information:

Building vLLM from source: Yes, it is possible to build vLLM from source on the Jetson Orin AGX 64GB device. However, this requires careful consideration of the build dependencies, patches, and CMake options required for aarch64 architecture. The user can refer to the vLLM GitHub repository for build instructions and dependencies.
vLLM version compatibility: The user can try building vLLM 0.11.1 from source, but they should be aware that this version may not be compatible with the DeepSeekOCR project. The user may need to modify the DeepSeekOCR code to work with the newer vLLM version.
Official support and recommendations: NVIDIA does provide prebuilt vLLM wheels and images for various platforms, including JetPack 6.2 and CUDA 12.6. However, the availability of these prebuilt versions may depend on the specific vLLM version and platform. The user can check the NVIDIA website for the latest prebuilt vLLM versions and images.

In summary, the user can try building vLLM 0.11.1 from source on the Jetson Orin AGX 64GB device, but they should be aware of the potential compatibility issues with the DeepSeekOCR project. The user can also check the NVIDIA website for prebuilt vLLM versions and images for their platform.

Comparison with other options:

vLLM prebuilt wheels: The user has tried using prebuilt vLLM wheels from PyPI, but these wheels are not compatible with the Jetson Orin AGX 64GB device.
vLLM container images: The user has tried using official vLLM container images, but these images are not compatible with the DeepSeekOCR project.
Building vLLM from source: Building vLLM from source is a viable option, but it requires careful consideration of the build dependencies, patches, and CMake options required for aarch64 architecture.

Recommendations:

The user should try building vLLM 0.11.1 from source on the Jetson Orin AGX 64GB device, following the build instructions and dependencies provided in the vLLM GitHub repository.
The user should be aware of the potential compatibility issues with the DeepSeekOCR project and may need to modify the code to work with the newer vLLM version.
The user can check the NVIDIA website for prebuilt vLLM versions and images for their platform, but the availability of these prebuilt versions may depend on the specific vLLM version and platform.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

ckdavid233 · January 27, 2026, 3:14am

Is there any stable tutorial to build vLLM locally?I didn’t find any instruction…

AastaLLL · January 28, 2026, 2:38am

Hi,

You can find the building environment in the link below:

Thanks.

ckdavid233 · January 28, 2026, 6:18am

I‘ve tried this way for a moment.I used jetson-container and use the image tag

0.8.6-r36.4-cu128-24.04

and tried DeepSeek-OCR.It threw out an error:DeepseekOCRMultiModalProcessor._cached_apply_hf_processor()
got an unexpected keyword argument ‘return_mm_hashes’.

Then I used the command to look up which version I can build.

cd jetson-containers
./build.sh vllm --list

I found that only 0.14.0 can be built.I’ve tried this whl package and failed before.


Namespace(packages=['vllm'], name='', base='', multiple=False, build_flags='', build_args='', use_proxy=False, package_dirs=[''], list_packages=True, show_packages=False, skip_packages=[''], skip_errors=False, skip_tests=[''], test_only=[''], simulate=False, push='', no_github_api=False, log_dir=None, log_level=None, log_colors=None, log_status=None, debug=False, verbose=False, version=False)

┌───────────────────────┬────────────────────────┐
│ L4T_VERSION   36.4.3  │ JETPACK_VERSION  6.2   │
│ CUDA_VERSION  12.6    │ PYTHON_VERSION   3.10  │
│ SYSTEM_ARCH   aarch64 │ LSB_RELEASE      22.04 │
└───────────────────────┴────────────────────────┘

$ jetson-containers vllm --list

[14:13:10] Saved versions to cache: {'transformers': '5.0.0'} 
vllm:0.14.0

Is there any way to successfully run Deepseek-OCR locally on Jetson-AGX-ORIN?

AastaLLL · February 2, 2026, 6:33am

Hi,

Could you check the minimum vLLM required by your model and build it from the source directly?
The repo shared above should be able to build different vLLM versions for Orin.

Thanks.

ckdavid233 · February 2, 2026, 8:04am

The minimum requirement for Deepseek-OCR may be cuda11.8+vllm0.8.5(This is the experimental environment that the project team used).I’ve tried the whl packages downloaded from jp6/cu126 index but it didn’t work for me.Next I will try to use the Dockerfile in jetson-container to build the version of vllm:v0.8.5.post1.Thanks.

AastaLLL · February 9, 2026, 5:07am

Hi,

We have upgraded the vLLM package to 0.14.x now.
Could you give it a try?

More, could you share what kind of error you meet when using the vLLM?

Thanks.

system · March 11, 2026, 2:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to Run vLLM ≥ 0.11.0 on Jetson AGX Orin? Jetson AGX Orin llm	8	572	December 4, 2025
How to Build the Latest vLLM from Source for Jetson Using Nvidia’s Docker Images? Jetson AGX Orin generative_ai	7	879	December 30, 2025
Vllm on Jetson AGX orin Jetson AGX Orin pytorch , generative_ai	9	4797	July 17, 2024
Getting Error in installing vllm on Nvidia Jetson AGX ORIN Jetson AGX Orin generative_ai	3	1197	July 12, 2024
How to run vllm on jetpack 6.0 Jetson AGX Orin cudnn , llm	3	68	April 7, 2026
Looking for a Tutorial for Running vLLM on AGX Orin Jetson AGX Orin containers , llm	10	259	February 25, 2026
Request for suitable vLLM Docker for Jetson AGX Orin with CUDA 12.6 Jetson AGX Orin cuda , docker , llm	3	170	November 19, 2025
Gen AI Benchmarking: LLMs and VLMs on Jetson Jetson AGX Orin llm	7	169	November 5, 2025
LLM library recomendations for maximum token speeds Jetson AGX Orin cuda , llama	12	345	March 30, 2026
Running LLM in jetson agx orin Jetson AGX Orin llm	4	153	March 4, 2026

How to build a specific vLLM version (0.11.1) on Jetson Orin AGX (CUDA 12.6 / JetPack 6.2 r36.4.3)?

Related topics