root@jetson:/workspace# pip install vllm
Collecting vllm
Downloading vllm-0.5.0.post1.tar.gz (743 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 743.2/743.2 kB 12.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-d1bct981/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "<string>", line 415, in <module>
File "<string>", line 341, in get_vllm_version
RuntimeError: Unknown runtime environment
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
However, when I prompt python3 and try verifying the cuda availability,
root@jetson:/workspace# python3
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.version.cuda)
11.4
Whereas it’s printed as 11.4 when I print it within a python prompt. (as shown in the original post)
So, to be clear, the problem is not that the CUDA version doesn’t meet the requirement but is that torch doesn’t not correctly recognize cuda version during the installation.
My guess is that vLLM’s requirements.txt or pyproject.toml uninstalls the built-in version of PyTorch (that was built with CUDA enabled) in lieu of a different version of PyTorch from pypi (that wasn’t built with CUDA). Or perhaps it explicitly needs run with pip3 instead of pip. Regardless, it is for reasons like this that I use jetson-containers to make sure that the right versions get installed, stay installed, and are continuously tested for CUDA functionality and performance.
I had previously tried to get vLLM building on JetPack to no avail - and IMO it is more geared for batching and server/cloud. The inferencing APIs/containers we have managed to get working through jetson-containers like MLC have near-optimum performance on Jetson - Benchmarks - NVIDIA Jetson AI Lab
Alas if you do manage to get it working, I would be happy to add it to jetson-containers build system and redistribute the images for everyone to use.
@dusty_nv
using pip3 didn’t help.
I don’t think vLLM is uninstalling the existing pytorch as I’m not seeing any log messages related to that and the pytorch version remains the same.
Although it’d be great if I can get vllm working, I’m also interested in using MLC container you mentioned. I found this repo. However, the image there seems to have a pretty old version of MLC, and I cannot follow the instruction from MLC nor can find the instruction for that version.
Is there a reference usage that I can follow for that specific version?
I’m using the image dustynv/mlc:51fb0f4-builder-r35.4.1. Do r36 version images have the up-to-date version of MLC? If so, could you also provide a guide to upgrade my system from r35 to r36?
@pcha the dustynv/mlc:0.1.1-r36.3.0 container is a more recent version of MLC, after they transitioned to mlc_llm convert_weight builder from the mlc_llm.build way - note that in my test script for the MLC container, I still support both methods:
BTW those 0.1 and 0.1.1 versions were numbers I made up, because the MLC project is basically unversioned and I needed a better way of tracking it than GitHub SHA’s. You can also use jetson-containers to build more recent MLC, however I apply patches to each build (which you can find under the patches directory). These patches are mostly to enable Orin’s sm87 in the dependencies for all the CUDA kernels that get compiled.
At some point months back it stopped building on older Python and JetPack 5, so going forward I only build the newer MLC versions for JetPack 6. You could attempt to apt-upgrade nvidia-jetpack on your device, but I would just re-flash it fresh with SDK Manager and get a clean start with it (after backing up your work)