How to deplay vllm without docker?

I’m trying to deplay a TTS model on Jetson Orin 16GB. The TTS model is customized and partially use VLLM as a speedup tool. Because the desired running environment is very complex, is there another way to install vllm on my python environment instead of using your docker image, so that I can control the dependencies?

I have verified that the TTS model is runnable on my DGX Spark device, but stuck at shipping the environment to Jetson Orin, because I can’t find a VLLM version suitable for this platform. I have tried the tutorials on this link, but got an error “No module named ‘nvidia.cu12’”. I also tried to install vllm directly from the official pypi site, but got error message “ImportError: /home/nvidia/miniconda3/envs/vllm-runtime/lib/python3.10/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib”.

Here is my Jetson Orin environment:
```
Package: nvidia-jetpack
Status: install ok installed
Priority: standard
Section: metapackages
Installed-Size: 194
Maintainer: NVIDIA Corporation
Architecture: arm64
Source: nvidia-jetpack (6.2.1)
Version: 6.2.1+b38
Depends: nvidia-jetpack-runtime (= 6.2.1+b38), nvidia-jetpack-dev (= 6.2.1+b38)
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8
Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer
```

Hi,

The package from our server is built with JetPack 6.2 and it should work:

Could you try it outside of the virtual environment to see if it can work?
Thanks.

It’s not feasible, as other robotic algorithms also run on this device. We need to isolate these environments.

VLLM is a large application that uses/wants a lot of ram. VLLM can easily use up the memory on a Jetson Thor.

Have you taken a look at https://github.com/NVIDIA/TensorRT-Edge-LLM to see if it could work for your use case?

That repo has lots of documentation to get started with it.

Pip is not very strict on the dependencies. Perhaps that’s the reason why it’s not working. Unless there is a strict dependency list which can guarantee the packages are EXACTLY the same as the ones on your official test device, the best option I have is building from source by myself. Is there a VLLM github repo with clear guidance targeting to Jetson Orin users?

I have already tested the ram usage on DGX SPARK, the RAM usage is only 5GB. It will work as long as the VLLM environment is ready.

What version Cuda 12 or 13 is on your Orin? https://vllm.ai/ Quickstart has a selector so you can choose the correct cuda version wheel.

You should install UV as that is vllm.ai method of installation.
curl -LsSf https://astral.sh/uv/install.sh | sh

uv venv --python 3.12 --seed --managed-python
source .venv/bin/activate

Go here and pick you VLLM Version https://vllm.ai/releases and install it.


Or

git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout $your chosen branch

python use_existing_torch.py
uv pip install . --torch-backend=auto

Hi,

Have you tried the below command:

$ pip install vllm --index-url https://pypi.jetson-ai-lab.io/jp6/cu126

It should install all the dependencies automatically.

Thanks

Yes and no. I did try this command but failed because of poor Internet connection, so I downloaded this offline package through my browser on another machine. Are you suggesting that’s why the installation failed?