How to build a specific vLLM version (0.11.1) on Jetson Orin AGX (CUDA 12.6 / JetPack 6.2 r36.4.3)?

Hi NVIDIA team,

I am using a Jetson Orin AGX 64GB with the following environment:

  • Device: Jetson Orin AGX 64GB
  • JetPack: 6.2 (L4T r36.4.3)
  • CUDA: 12.6
  • OS: Ubuntu 22.04 (aarch64)

I am trying to run the DeepSeek-OCR project on this device, which relies on vLLM.
However, I am having difficulty finding a compatible vLLM version for this platform.

Here is what I have tried so far:

  1. vLLM prebuilt wheels from https://pypi.jetson-ai-lab.io/
  • I previously downloaded vLLM 0.8.5 (which matches the DeepSeek-OCR test environment),
    but I encounter runtime errors when running the OCR pipeline.
  • Unfortunately, newer versions such as vLLM 0.11.x are not available as prebuilt wheels
    for JetPack 6.2 / CUDA 12.6.
  1. NVIDIA vLLM container images

I tested the official image:

From the available tags, I found that:

  • vLLM 0.14.0 (r36.4-tegra-aarch64-cu126) can start successfully,
    and the model server launches correctly.

However, when integrating with DeepSeek-OCR, I encounter the following error:

ImportError: cannot import name β€˜SamplingMetadata’ from β€˜vllm.model_executor’

This appears to be a breaking API change between vLLM 0.11.x and 0.14.x,
and DeepSeek-OCR currently depends on the older interface.


My main question is:

Is there an official or recommended way to build a specific vLLM version
(e.g. vLLM 0.11.1) from source on Jetson Orin AGX with CUDA 12.6?

Specifically, I would like to know:

  • Whether vLLM 0.11.x is expected to be buildable on JetPack 6 (r36.4)
  • Any patches, build flags, or CMake options required for aarch64 / Jetson
  • Whether NVIDIA plans to provide additional prebuilt vLLM wheels or images
    for intermediate versions like 0.11.x

Any guidance would be greatly appreciated.
Thank you very much for your support.

Best regards

*** Please note that this reply is generated by LLM automatically ***

The question is about building a specific version of vLLM (version 0.11.1) from source on a Jetson Orin AGX 64GB device with CUDA 12.6. The user is experiencing difficulties finding a compatible prebuilt vLLM version for their platform and is seeking guidance on building vLLM from source.

To address this question, we need to consider the following points:

  1. vLLM version compatibility: The user is trying to run the DeepSeekOCR project, which relies on vLLM 0.8.5. However, this version is not compatible with the Jetson Orin AGX 64GB device. The user has tried newer versions of vLLM, such as 0.11.x, but they are not available as prebuilt wheels for JetPack 6.2 and CUDA 12.6.
  2. Building vLLM from source: The user is asking whether it is possible to build vLLM 0.11.1 from source on the Jetson Orin AGX 64GB device. This requires considering the build dependencies, patches, and CMake options required for aarch64 architecture.
  3. Official support and recommendations: The user is seeking guidance on whether NVIDIA plans to provide additional prebuilt vLLM wheels or images for intermediate versions like 0.11.x.

To answer these questions, we can provide the following information:

  • Building vLLM from source: Yes, it is possible to build vLLM from source on the Jetson Orin AGX 64GB device. However, this requires careful consideration of the build dependencies, patches, and CMake options required for aarch64 architecture. The user can refer to the vLLM GitHub repository for build instructions and dependencies.
  • vLLM version compatibility: The user can try building vLLM 0.11.1 from source, but they should be aware that this version may not be compatible with the DeepSeekOCR project. The user may need to modify the DeepSeekOCR code to work with the newer vLLM version.
  • Official support and recommendations: NVIDIA does provide prebuilt vLLM wheels and images for various platforms, including JetPack 6.2 and CUDA 12.6. However, the availability of these prebuilt versions may depend on the specific vLLM version and platform. The user can check the NVIDIA website for the latest prebuilt vLLM versions and images.

In summary, the user can try building vLLM 0.11.1 from source on the Jetson Orin AGX 64GB device, but they should be aware of the potential compatibility issues with the DeepSeekOCR project. The user can also check the NVIDIA website for prebuilt vLLM versions and images for their platform.

Comparison with other options:

  • vLLM prebuilt wheels: The user has tried using prebuilt vLLM wheels from PyPI, but these wheels are not compatible with the Jetson Orin AGX 64GB device.
  • vLLM container images: The user has tried using official vLLM container images, but these images are not compatible with the DeepSeekOCR project.
  • Building vLLM from source: Building vLLM from source is a viable option, but it requires careful consideration of the build dependencies, patches, and CMake options required for aarch64 architecture.

Recommendations:

  • The user should try building vLLM 0.11.1 from source on the Jetson Orin AGX 64GB device, following the build instructions and dependencies provided in the vLLM GitHub repository.
  • The user should be aware of the potential compatibility issues with the DeepSeekOCR project and may need to modify the code to work with the newer vLLM version.
  • The user can check the NVIDIA website for prebuilt vLLM versions and images for their platform, but the availability of these prebuilt versions may depend on the specific vLLM version and platform.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Is there any stable tutorial to build vLLM locally?I didn’t find any instruction…

1 Like

Hi,

You can find the building environment in the link below:

Thanks.

Iβ€˜ve tried this way for a moment.I used jetson-container and use the image tag

0.8.6-r36.4-cu128-24.04

and tried DeepSeek-OCR.It threw out an error:DeepseekOCRMultiModalProcessor._cached_apply_hf_processor()
got an unexpected keyword argument β€˜return_mm_hashes’.

Then I used the command to look up which version I can build.

cd jetson-containers
./build.sh vllm --list

I found that only 0.14.0 can be built.I’ve tried this whl package and failed before.


Namespace(packages=['vllm'], name='', base='', multiple=False, build_flags='', build_args='', use_proxy=False, package_dirs=[''], list_packages=True, show_packages=False, skip_packages=[''], skip_errors=False, skip_tests=[''], test_only=[''], simulate=False, push='', no_github_api=False, log_dir=None, log_level=None, log_colors=None, log_status=None, debug=False, verbose=False, version=False)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ L4T_VERSION   36.4.3  β”‚ JETPACK_VERSION  6.2   β”‚
β”‚ CUDA_VERSION  12.6    β”‚ PYTHON_VERSION   3.10  β”‚
β”‚ SYSTEM_ARCH   aarch64 β”‚ LSB_RELEASE      22.04 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

$ jetson-containers vllm --list

[14:13:10] Saved versions to cache: {'transformers': '5.0.0'} 
vllm:0.14.0

Is there any way to successfully run Deepseek-OCR locally on Jetson-AGX-ORIN?

Hi,

Could you check the minimum vLLM required by your model and build it from the source directly?
The repo shared above should be able to build different vLLM versions for Orin.

Thanks.

The minimum requirement for Deepseek-OCR may be cuda11.8+vllm0.8.5(This is the experimental environment that the project team used).I’ve tried the whl packages downloaded from jp6/cu126 index but it didn’t work for me.Next I will try to use the Dockerfile in jetson-container to build the version of vllm:v0.8.5.post1.Thanks.

Hi,

We have upgraded the vLLM package to 0.14.x now.
Could you give it a try?

More, could you share what kind of error you meet when using the vLLM?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.