Issue with run gpt-oss-120b in vLLM

anasdavoodtk1 · September 28, 2025, 2:14am

First, I downloaded the thor_vllm_container from here Link

Then run the container

sudo docker run --runtime=nvidia \
--gpus all \
-it \
--rm \
--network=host \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
thor_vllm_container:25.08-py3-base

Once inside the container
vllm serve openai/gpt-oss-120b

But this is not working, a lot of errors I got. So any suggestions?

AastaLLL · September 30, 2025, 3:24am

Hi,

Thanks for reporting this.

We also see some errors when running the gpt-oss-120b model on the vLLM container.
Will check it further and provide more info to you later.

Thanks

anasdavoodtk1 · September 30, 2025, 4:57am

thanks

AastaLLL · September 30, 2025, 5:23am

Hi,

There is a compatibility issue, and we will check it further.

But the model can work correctly with ollama.
You can try it if ollama is an option for you:

$ sudo docker run -it --rm --runtime nvidia ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04
# ollama run gpt-oss:120b
>>> Which number is larger, 9.11 or 9.8?
Thinking...
The user asks: Which number is larger, 9.11 or 9.8? Straightforward: 9.8 > 9.11. But note that 9.11 could be interpreted as a date (September 11) but here 
it's numeric. So answer: 9.8 is larger. Could add explanation about decimal comparison. Provide clear answer.
...done thinking.

9.8 is the larger number.  

When comparing decimals, you look at the digits from left to right:

- Both numbers start with a 9 in the units place.
- In the tenths place, 9.8 has an 8, while 9.11 has a 1.

Since 8 > 1, the number 9.8 is greater than 9.11.

>>> Send a message (/? for help)

Thanks.

anasdavoodtk1 · September 30, 2025, 6:05am

yes i tried with ollama , but i think better in vllm to get more token per second

ckdavid233 · September 30, 2025, 7:51am

Hi!Is there any tutorial that can help me to build vllm from source code myself?I’d like to make a test on the model Qwen3-Next-80B-A3B-Instruct-AWQ-4bit.However the version of vllm in this container is not compatible.I think I can use the pytorch container(nvcr.io/nvidia/pytorch 25.08-py3 or 25.09-py3) to build my vllm from source code.But there is so much trouble to me.

anasdavoodtk1 · September 30, 2025, 8:05am

Check this way, I tried withQwen2.5-VL-7B-Instruct-quantized.w4a16 and it’s working, but you can try with qwen3…

First, download the image from this link

sudo docker run --runtime=nvidia \ --gpus all \ -it \ --rm \ --network=host \ --ipc=host \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ thor_vllm_container:25.08-py3-base

3 )inside the container

python -m vllm.entrypoints.openai.api_server
–model RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w4a16
–quantization compressed-tensors
–host 0.0.0.0
–gpu-memory-utilization 0.24

ckdavid233 · September 30, 2025, 8:12am

Is the version of vllm in this container is 0.9.2+4ef1e343.nv25.8.post1.cu130?I’ve tried before.But this Qwen3 model requires me to build vllm from the newest source code for vllm-0.11.0rc2 that can work.Btw thank you so much.

Topic		Replies	Views
Install vllm in Thor failed Jetson Thor generative_ai	4	278	September 15, 2025
Getting Error in installing vllm error on Jetson orin nx jetpack6.2 Jetson Orin NX jetpack , cuda , pytorch , python	7	376	May 15, 2025
Vllm on Jetson AGX orin Jetson AGX Orin pytorch , generative_ai	9	3504	July 17, 2024
Getting Error in installing vllm on Nvidia Jetson AGX ORIN Jetson AGX Orin generative_ai	3	987	July 12, 2024
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	24871	May 10, 2024
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1771	January 25, 2024
Triton Inference Server + vLLM Backend on the NVIDIA Jetson AGX Orin 64GB Developer Kit Jetson Projects generative_ai	9	784	June 16, 2025
TensorRT-LLM for jetson errors Jetson AGX Orin generative_ai , paligemma , kosmos-2 , llama	14	665	January 16, 2025
Run llm stuck while use jetson thor Jetson Thor cuda , generative_ai	6	84	September 25, 2025
Can't run NanoVLM on Jetson Orin NX 16GB Jetson Orin NX generative_ai	4	282	May 16, 2024

Issue with run gpt-oss-120b in vLLM

Related topics