Qwen3-VL Run in Jetson AGX ORIN Developer Kit

Hi,

I’m looking for a way to run Qwen3-VL on a Jetson AGX Orin Developer Kit.

My plan is to train the model on a much more powerful machine and then transfer the trained model. After that, I would like to use one of NVIDIA’s tools to quantize the model, ideally using a benchmark/calibration dataset so that the quantization causes minimal accuracy degradation.

Once the model is quantized, I want to deploy it to the edge device (Jetson AGX Orin). I would also like to understand which containers I need to install, how the deployment process works, and how to run a model that I have trained myself.

I would appreciate some basic guidance on the overall workflow and the required components, because I’m getting a bit lost with the amount of information available online.

Thanks!

Omer.

*** Please note that this reply is generated by LLM automatically ***

I’d be happy to help you with your question about running Qwen3VL on a Jetson AGX Orin Developer Kit.

Overview of the Workflow

To deploy a trained model on a Jetson AGX Orin Developer Kit, you’ll need to follow these general steps:

  1. Train your model: Train your Qwen3VL model on a more powerful machine using your preferred deep learning framework.
  2. Quantize the model: Use NVIDIA’s tools to quantize the trained model, which will reduce the model’s size and improve its performance on the edge device. You can use a benchmark/calibration dataset to fine-tune the quantization process and minimize accuracy degradation.
  3. Deploy the model: Deploy the quantized model to the Jetson AGX Orin Developer Kit using NVIDIA’s deployment tools.
  4. Run the model: Run the deployed model on the Jetson AGX Orin Developer Kit using NVIDIA’s inference tools.

Required Components

To deploy a trained model on a Jetson AGX Orin Developer Kit, you’ll need the following components:

  1. Jetson AGX Orin Developer Kit: This is the edge device on which you’ll deploy and run your model.
  2. NVIDIA JetPack: This is a software development kit (SDK) that provides a comprehensive set of tools and libraries for developing and deploying AI applications on NVIDIA’s Jetson platform.
  3. TensorRT: This is a software development kit (SDK) that provides a set of tools and libraries for optimizing and deploying deep learning models on NVIDIA’s GPUs, including the Jetson AGX Orin.
  4. Docker: This is a containerization platform that allows you to package your model and its dependencies into a container that can be easily deployed and run on the Jetson AGX Orin Developer Kit.

Quantization and Deployment

To quantize and deploy your model, you can use the following tools:

  1. TensorRT’s Quantization Tool: This tool allows you to quantize your trained model using a benchmark/calibration dataset.
  2. TensorRT’s Model Optimizer: This tool allows you to optimize your quantized model for deployment on the Jetson AGX Orin Developer Kit.
  3. NVIDIA’s Deep Learning SDK (DLSDK): This SDK provides a set of tools and libraries for deploying and running deep learning models on NVIDIA’s GPUs, including the Jetson AGX Orin.

Containers and Deployment

To deploy your model, you’ll need to create a Docker container that includes your model, its dependencies, and the necessary NVIDIA tools and libraries. You can use the following containers:

  1. NVIDIA’s TensorRT Container: This container provides a pre-built environment for running TensorRT on the Jetson AGX Orin Developer Kit.
  2. NVIDIA’s DLSDK Container: This container provides a pre-built environment for running the DLSDK on the Jetson AGX Orin Developer Kit.

Running the Model

To run your deployed model, you can use the following tools:

  1. TensorRT’s Inference Engine: This engine allows you to run your deployed model on the Jetson AGX Orin Developer Kit.
  2. NVIDIA’s DLSDK’s Inference Engine: This engine allows you to run your deployed model on the Jetson AGX Orin Developer Kit.

Comparison of Containers and Deployment Options

Here’s a comparison of the containers and deployment options:

Container/Deployment Option Description Advantages Disadvantages
NVIDIA’s TensorRT Container Pre-built environment for running TensorRT on the Jetson AGX Orin Developer Kit Easy to use, optimized for TensorRT Limited flexibility
NVIDIA’s DLSDK Container Pre-built environment for running the DLSDK on the Jetson AGX Orin Developer Kit Easy to use, optimized for DLSDK Limited flexibility
Custom Docker Container Custom container that includes your model, its dependencies, and the necessary NVIDIA tools and libraries High flexibility, customizable Requires more effort to set up

Conclusion

In conclusion, to deploy a trained Qwen3VL model on a Jetson AGX Orin Developer Kit, you’ll need to follow the general workflow outlined above, which includes training your model, quantizing the model, deploying the model, and running the model. You’ll need to use NVIDIA’s tools and libraries, including TensorRT and the DLSDK, and create a Docker container that includes your model, its dependencies, and the necessary NVIDIA tools and libraries. You can use pre-built containers, such as NVIDIA’s TensorRT Container and DLSDK Container, or create a custom Docker container.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

We have tested Qwen3-VL (both 4B and 8B) on the AGX Orin with AWQ-4bit quantization.
You can try to deploy it first:

The model is shared by @cyankiwi on Hugging Face.
Based on the below, the model is quantized with the llmcompressor:

Thanks.
Ple