Gen AI Benchmarking: LLMs and VLMs on Jetson

Try to get and launch the vLLM Container, it is shown “No CUDA-capable device is detected (CUDA_ERROR_NO_DEVICE) cuInit()=100”

sudo docker run --rm -it --network host --shm-size=16g --ulimit memlock=-1 --ulimit stack=67108864 --runtime=nvidia --name=vllm nvcr.io/nvidia/vllm:25.09-py3

==========

== vLLM ==

NVIDIA Release 25.09 (build 214638690)
vLLM Version 0.10.1.1+381074ae
Container image Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2024 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
and the Product-Specific Terms for NVIDIA AI Products
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).

ERROR: This container was built for NVIDIA Driver Release 580.82 or later, but
version 540.4.0 was detected and compatibility mode is UNAVAILABLE.

   [[No CUDA-capable device is detected (CUDA_ERROR_NO_DEVICE) cuInit()=100]]

Hi there @twshen2000, welcome to the NVIDIA developer forums.

Which Jetson device are you trying to use here? Then I can refer you to the correct category on the forums.

Thanks!

Jetson AGX orin

dpkg -l | grep nvidia-jetpack
ii nvidia-jetpack 6.2.1+b38 arm64 NVIDIA Jetpack Meta Package
ii nvidia-jetpack-dev 6.2.1+b38 arm64 NVIDIA Jetpack dev Meta Package
ii nvidia-jetpack-runtime 6.2.1+b38 arm64 NVIDIA Jetpack runtime Meta Package

I try the docker dustynv/vllm:0.8.6-r36.4-cu128-24.04, it works for “RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16”, but failed for “RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8” and “RedHatAI/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w8a8”.

Hi,

The container nvcr.io/nvidia/vllm:25.09-py3 is built for the SBSA driver.

Orin uses the nvgpu driver, so please try the vLLM container from jetson-container.
What kind of error when you run the w8a8 model?

Thanks.