Gpt-oss 20b not working on RTX 4090 with 24Gb RAM

bart.jenkins001 · September 8, 2025, 5:47pm

Ubuntu 24.04, NVIDIA-SMI 575.64.03, CUDA Version: 12.9

Trying to run gpt-oss 20b (which works under Ollama) and I get the following error:
…
INFO 2025-09-08 17:33:58.150 gpu_model_runner.py:1856] Loading model from scratch…
INFO 2025-09-08 17:33:58.195 cuda.py:290] Using Flash Attention backend on V1 engine.
ERROR 2025-09-08 17:33:58.431 core.py:680] EngineCore failed to start.
Traceback (most recent call last):
File “/opt/nim/llm/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 671, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
…
<More error puke omitted for brevity…>

Turning around and running the following llama 3.1 model:

”nvcr.io/nim/nvidia/llama3.1-nemotron-nano-4b-v1.1:latest”

works fine.

Is it a model size issue? Certainly can’t tell from the error dump.

Topic		Replies	Views
NIM gpt-oss-20b Fails on NVIDIA L4 with "AssertionError: Sinks are only supported in FlashAttention 3" Models nim	5	522	August 30, 2025
RTX 4090 shows as "non-free GPU" when running NIM model in docker NVIDIA Nemotron nim	8	2209	October 21, 2024
Gemma3:4b not using the gpu while gemma3:1b does on orin Jetson Nano super Jetson Orin Nano generative_ai , llama	2	350	June 2, 2025
CUDA fail start. Local NIM Containers run failed CUDA Setup and Installation nim , llama-31-405b-instruct , llama	2	237	September 20, 2024
Issues while starting NIM container in A10 VM Models nim , llama3-8b-instruct	4	176	September 4, 2024
Nemo and 4090 Riva	2	1344	May 7, 2023
TensorRT-LLM for jetson errors Jetson AGX Orin generative_ai , paligemma , kosmos-2 , llama	14	654	January 16, 2025
TensorRT-LLM on Jetson Orin NX(16GB) Jetson Orin NX tensorrt , jetson-inference , generative_ai	9	667	February 12, 2025
NeMo Agent Toolkit failed to use OpenAI gpt-oss:20b NVIDIA NeMo nemo-agent-toolkit	0	45	August 11, 2025
ChatRTX with Gemma 7B and Lamma2 13B NVIDIA Nemotron	1	454	May 20, 2024

Gpt-oss 20b not working on RTX 4090 with 24Gb RAM

Related topics