The token speed of qwen 2.5 vl 3b model is very lower on Jeston AGX Orin

zeri.zhu · September 16, 2025, 2:30am

when I use the docker to employ qwen 2.5 vl 3b model as the follow instruction on 64GB Jetson AGX Orin：

I only get the 30 token/s speed when test on vlm-bench.py,

the setting config is :
VLLM with quantization=w4a 16, max concurrency =8, input seq len = 2048 and output seg len = 128

but the official reported benchmark is 216 tokens/s which is super higher than my result:

I want to know the the official model is optimized by TensorRT-LLM or only by vllm deploy engine? And why the result is so difference, how should I optimize my method？

My pytorch version is： 2.3.0 + cuda 12.4
jetpack version is： 6.2.1+b38

DaneLLL · September 16, 2025, 5:41am

Hi,
We are checking with our team. Will share information about how we did the benchmark.

AastaLLL · September 22, 2025, 3:25am

Hi,

Thanks for your patience.

Please find the steps in the link below to reproduce the benchmark results.
We can get 225.65 output token throughput on an AGX Orin 64GB + developer kit.

Thanks.

Topic		Replies	Views
Can't run nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime on Orin AGX Jetson AGX Orin tensorrt	19	1227	May 13, 2022
The token speed of LLM on Jetson AGX Orin Jetson AGX Orin generative_ai , llm , llama	4	86	September 25, 2025
Orin low performance on mobilnetv1 ssd Jetson AGX Orin jetson-inference	7	1200	June 1, 2022
Inference slow even using TensorRT Jetson AGX Orin tensorrt	15	1955	November 6, 2023
LLMs token/sec Jetson AGX Orin generative_ai	2	1093	April 8, 2024
Slow object detection speed Xavier AGX 32GB Jetson AGX Xavier tensorrt , tensorflow	6	1245	October 18, 2021
Pytorch with jetpack 4.2 works slowly than 3.3 Jetson TX2	6	1383	October 18, 2021
Jetpack Vision for Deeplearning issue Jetson AGX Xavier	7	564	October 18, 2021
Jetson AGX Xavier shows unstable inference time Jetson AGX Xavier tensorrt , jetson-inference	6	718	October 18, 2021
TRT inference speed on two AGX Xavier TensorRT	1	318	September 12, 2021

The token speed of qwen 2.5 vl 3b model is very lower on Jeston AGX Orin

Related topics