Gpt-oss 20b not working on RTX 4090 with 24Gb RAM

Ubuntu 24.04, NVIDIA-SMI 575.64.03, CUDA Version: 12.9

Trying to run gpt-oss 20b (which works under Ollama) and I get the following error:

INFO 2025-09-08 17:33:58.150 gpu_model_runner.py:1856] Loading model from scratch…
INFO 2025-09-08 17:33:58.195 cuda.py:290] Using Flash Attention backend on V1 engine.
ERROR 2025-09-08 17:33:58.431 core.py:680] EngineCore failed to start.
Traceback (most recent call last):
File “/opt/nim/llm/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 671, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

<More error puke omitted for brevity…>

Turning around and running the following llama 3.1 model:

nvcr.io/nim/nvidia/llama3.1-nemotron-nano-4b-v1.1:latest”

works fine.

Is it a model size issue? Certainly can’t tell from the error dump.