FP8 series models hosting with the official 2509 vLLM consistently produces garbled output

changtimwu · November 13, 2025, 7:33am

Hi!

Since NVIDIA released the official vllm image and THOR benchmark results (Jetson Benchmarks | NVIDIA Developer), I started testing vLLM-compatible models on this platform.

I found two FP8 models that benchmark with vllm bench but return garbled output:

Their full-precision (BF16?) counterparts do not exhibit this issue:

Besides,

Environment:

OS: Jetson Linux 38.2
vllm: nvcr.io/nvidia/vllm:25.09-py3
LLM App: OpenWebUI

AastaLLL · November 13, 2025, 8:02am

Hi,

There is a newer 25.10 vllm container, could you give it a try as well?

nvcr.io/nvidia/vllm:25.10-py3

Thanks.

changtimwu · November 13, 2025, 8:48am

Upgrading the vllm image to 25.10-py3 still doesn’t resolve the issue. :(

AastaLLL · November 17, 2025, 5:46am

Hi,

We use the RedHatAI model for the benchmarking.
Could you try the model below to see if it can work on Thor correctly?

Thanks.

changtimwu · November 18, 2025, 5:19am

Thanks! I randomly picked up two RedHatAI modles as below They work well on Thor.

system · December 16, 2025, 1:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Announcing new VLLM container & 3.5X increase in Gen AI Performance in just 5 weeks of Jetson AGX Thor Launch Jetson Thor jetson , llama-31-8b-instruct , llama , deepseek , nemotron	46	3274	December 14, 2025
Thor开发板上测试vllm失败 Jetson Thor generative_ai	9	261	November 5, 2025
Experiences running Qwen/Qwen3-Coder-Next? Jetson Thor inference-server-triton , generative_ai	6	252	February 23, 2026
求救，运行vllm报错 Jetson Thor camera , generative_ai	4	356	November 17, 2025
Low ViT Performance Gain on Jetson Thor Using FP8 vs FP16 Jetson Thor nvbugs , generative_ai	13	587	February 13, 2026
Performance Comparison of Qwen3-30B-A3B-AWQ on Jetson Thor vs Orin AGX 64GB Jetson Thor generative_ai	10	1275	September 25, 2025
Gen AI Benchmarking: LLMs and VLMs on Jetson Jetson AGX Orin llm	7	142	November 5, 2025
Issue with run gpt-oss-120b in vLLM Jetson Thor generative_ai	23	2785	October 18, 2025
vLLM container 25.10-py3 fails to start Jetson Thor nvbugs , generative_ai	13	555	December 8, 2025
Install vllm in Thor failed Jetson Thor generative_ai	6	1031	October 16, 2025

FP8 series models hosting with the official 2509 vLLM consistently produces garbled output

Related topics