Vllm on Jetson AGX orin

pcha · June 18, 2024, 2:19pm

Hi,

I’m trying to install vllm on my Jetson AGX orin developer kit.

I’m using the following image: nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
and I get this error when I pip install vllm

root@jetson:/workspace# pip install vllm
Collecting vllm
  Downloading vllm-0.5.0.post1.tar.gz (743 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 743.2/743.2 kB 12.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      Traceback (most recent call last):
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 415, in <module>
        File "<string>", line 341, in get_vllm_version
      RuntimeError: Unknown runtime environment
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

Note the error message Unknown runtime environment
I figured that this is thrown here vllm/setup.py at main · vllm-project/vllm · GitHub due to torch.version.cuda being none

However, when I prompt python3 and try verifying the cuda availability,

root@jetson:/workspace# python3
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version.cuda)
11.4

Any help would be appreciated. Thanks

AastaLLL · June 19, 2024, 3:26am

Hi,

Is the error triggered from the below line?

github.com

vllm-project/vllm/blob/main/setup.py#L347


      
                  if neuron_version != MAIN_CUDA_VERSION:
                      neuron_version_str = neuron_version.replace(".", "")[:3]
                      version += f"+neuron{neuron_version_str}"
              elif _is_tpu():
                  version += "+tpu"
              elif _is_cpu():
                  version += "+cpu"
              elif _is_xpu():
                  version += "+xpu"
              else:
                  raise RuntimeError("Unknown runtime environment")
          
              return version
          
          
          def read_readme() -> str:
              """Read the README file if present."""
              p = get_path("README.md")
              if os.path.isfile(p):
                  return io.open(get_path("README.md"), "r", encoding="utf-8").read()
              else:

If yes, please also check if the CUDA version meets the requirement.

Thanks.

pcha · June 19, 2024, 4:14am

That’s correct. I’ve already noted that in the original post.

I also tried building vllm from source (pip install -e .) and tried inserting a print statement of torch.version.cuda at vllm/setup.py at main · vllm-project/vllm · GitHub. It’s printed as none.

Whereas it’s printed as 11.4 when I print it within a python prompt. (as shown in the original post)

So, to be clear, the problem is not that the CUDA version doesn’t meet the requirement but is that torch doesn’t not correctly recognize cuda version during the installation.

ramitpahwa · June 20, 2024, 10:19pm

@pcha were you able to find a resolution for this ?

dusty_nv · June 21, 2024, 2:53am

My guess is that vLLM’s requirements.txt or pyproject.toml uninstalls the built-in version of PyTorch (that was built with CUDA enabled) in lieu of a different version of PyTorch from pypi (that wasn’t built with CUDA). Or perhaps it explicitly needs run with pip3 instead of pip. Regardless, it is for reasons like this that I use jetson-containers to make sure that the right versions get installed, stay installed, and are continuously tested for CUDA functionality and performance.

I had previously tried to get vLLM building on JetPack to no avail - and IMO it is more geared for batching and server/cloud. The inferencing APIs/containers we have managed to get working through jetson-containers like MLC have near-optimum performance on Jetson - Benchmarks - NVIDIA Jetson AI Lab

Alas if you do manage to get it working, I would be happy to add it to jetson-containers build system and redistribute the images for everyone to use.

pcha · June 22, 2024, 5:09am

@ramitpahwa No. still working on it

pcha · June 23, 2024, 2:49am

@dusty_nv
using pip3 didn’t help.
I don’t think vLLM is uninstalling the existing pytorch as I’m not seeing any log messages related to that and the pytorch version remains the same.

Although it’d be great if I can get vllm working, I’m also interested in using MLC container you mentioned. I found this repo. However, the image there seems to have a pretty old version of MLC, and I cannot follow the instruction from MLC nor can find the instruction for that version.

Is there a reference usage that I can follow for that specific version?
I’m using the image dustynv/mlc:51fb0f4-builder-r35.4.1. Do r36 version images have the up-to-date version of MLC? If so, could you also provide a guide to upgrade my system from r35 to r36?

dusty_nv · June 26, 2024, 4:06pm

@pcha the dustynv/mlc:0.1.1-r36.3.0 container is a more recent version of MLC, after they transitioned to mlc_llm convert_weight builder from the mlc_llm.build way - note that in my test script for the MLC container, I still support both methods:

github.com

dusty-nv/jetson-containers/blob/master/packages/llm/mlc/test.sh

#!/usr/bin/env bash
set -ex

MODEL_NAME=${1:-"Llama-2-7b-hf"}
MODEL_PATH=${2:-"https://nvidia.box.com/shared/static/i3jtp8jdmdlsq4qkjof8v4muth8ar7fo.gz"}
MODEL_ROOT=${MODEL_ROOT:-"/data/models/mlc/dist"}

QUANTIZATION=${QUANTIZATION:-"q4f16_ft"}
QUANTIZATION_PATH="${MODEL_ROOT}/${MODEL_NAME}-${QUANTIZATION}"
SKIP_QUANTIZATION=${SKIP_QUANTIZATION:-"no"}

PROMPT=${PROMPT:-"/data/prompts/completion.json"}
CONV_TEMPLATE=${CONV_TEMPLATE:-"llama-2"}
MAX_CONTEXT_LEN=${MAX_CONTEXT_LEN:-4096}
MAX_NUM_PROMPTS=${MAX_NUM_PROMPTS:-4}

USE_CACHE=${USE_CACHE:-1}
USE_SAFETENSORS=${USE_SAFETENSORS:-"auto"}

OUTPUT_CSV=${OUTPUT_CSV:-"/data/benchmarks/mlc.csv"}

This file has been truncated. show original

BTW those 0.1 and 0.1.1 versions were numbers I made up, because the MLC project is basically unversioned and I needed a better way of tracking it than GitHub SHA’s. You can also use jetson-containers to build more recent MLC, however I apply patches to each build (which you can find under the patches directory). These patches are mostly to enable Orin’s sm87 in the dependencies for all the CUDA kernels that get compiled.

At some point months back it stopped building on older Python and JetPack 5, so going forward I only build the newer MLC versions for JetPack 6. You could attempt to apt-upgrade nvidia-jetpack on your device, but I would just re-flash it fresh with SDK Manager and get a clean start with it (after backing up your work)

pcha · June 26, 2024, 4:54pm

Thanks a lot.

Although I’d love to upgrade to JetPack6, it seems impossible to do it with my m1 mac.

system · July 17, 2024, 3:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting Error in installing vllm on Nvidia Jetson AGX ORIN Jetson AGX Orin generative_ai	3	935	July 12, 2024
Getting Error in installing vllm error on Jetson orin nx jetpack6.2 Jetson Orin NX jetpack , cuda , pytorch , python	7	224	May 15, 2025
Jetson orin NX jetpack6.2 vllm install error Jetson Orin NX jetpack , python , jetson	2	58	May 8, 2025
Vllms on older jetpack (5.1.3) Jetson AGX Orin pytorch , generative_ai	3	18	July 8, 2025
RuntimeError: CUDA error: no kernel image is available for execution on the device Jetson AGX Orin cuda	4	60	May 26, 2025
Fail to install mlc llm Jetson AGX Orin generative_ai	4	281	October 16, 2024
Can I use TensorRT-LLM in Jetson AGX orin? Jetson AGX Orin nvbugs , generative_ai	3	666	July 15, 2024
Suitable VLLM Container for Jetson Xavier NX with JetPack 5.1.4 Jetson Xavier NX generative_ai	4	99	May 28, 2025
Jetson device can't use cuda Jetson AGX Orin cuda	2	79	November 11, 2024
Issue with Nvidia Jetson AGX Orin Developer Kit (64 Gb) Jetson AGX Orin cuda , generative_ai	5	87	July 30, 2025

Vllm on Jetson AGX orin

Related topics