FP8 series models hosting with the official 2509 vLLM consistently produces garbled output

Hi!

Since NVIDIA released the official vllm image and THOR benchmark results (Jetson Benchmarks | NVIDIA Developer), I started testing vLLM-compatible models on this platform.

I found two FP8 models that benchmark with vllm bench but return garbled output:

Their full-precision (BF16?) counterparts do not exhibit this issue:

Besides,

Environment:

Hi,

There is a newer 25.10 vllm container, could you give it a try as well?

nvcr.io/nvidia/vllm:25.10-py3

Thanks.

Upgrading the vllm image to 25.10-py3 still doesn’t resolve the issue. :(

Hi,

We use the RedHatAI model for the benchmarking.
Could you try the model below to see if it can work on Thor correctly?

Thanks.

Thanks! I randomly picked up two RedHatAI modles as below They work well on Thor.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.