Well, when I build vllm on top of pytorch NGC image (25.11-py3), it fails to load gpt-oss-120b with this error:
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] EngineCore failed to start.
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] Traceback (most recent call last):
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_config_module.py", line 356, in __getattr__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] config = self._config[name]
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ~~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] KeyError: 'assume_32bit_indexing'
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866]
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866]
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] Traceback (most recent call last):
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 857, in run_engine_core
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 637, in __init__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] super().__init__(
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 240, in _initialize_kv_caches
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 461, in run_method
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return func(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return func(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 328, in determine_available_memory
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] self.model_runner.profile_run()
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4544, in profile_run
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return func(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4268, in _dummy_run
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] outputs = self.model(
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1777, in _wrapped_call_impl
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1788, in _call_impl
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 722, in forward
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] return self.model(input_ids, positions, intermediate_tensors, inputs_embeds)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 504, in __call__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] with (
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_config_module.py", line 666, in __enter__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] prior[key] = config.__getattr__(key)
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_config_module.py", line 388, in __getattr__
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] raise AttributeError(f"{self.__name__}.{name} does not exist") from e
(EngineCore_DP0 pid=13022) ERROR 12-18 05:30:40 [core.py:866] AttributeError: torch._inductor.config.assume_32bit_indexing does not exist
(EngineCore_DP0 pid=13022) Process EngineCore_DP0:
(EngineCore_DP0 pid=13022) Traceback (most recent call last):
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_config_module.py", line 356, in __getattr__
(EngineCore_DP0 pid=13022) config = self._config[name]
(EngineCore_DP0 pid=13022) ~~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=13022) KeyError: 'assume_32bit_indexing'
(EngineCore_DP0 pid=13022)
(EngineCore_DP0 pid=13022) The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=13022)
(EngineCore_DP0 pid=13022) Traceback (most recent call last):
(EngineCore_DP0 pid=13022) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=13022) self.run()
(EngineCore_DP0 pid=13022) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=13022) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 870, in run_engine_core
(EngineCore_DP0 pid=13022) raise e
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 857, in run_engine_core
(EngineCore_DP0 pid=13022) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 637, in __init__
(EngineCore_DP0 pid=13022) super().__init__(
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=13022) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 240, in _initialize_kv_caches
(EngineCore_DP0 pid=13022) available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
(EngineCore_DP0 pid=13022) return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=13022) result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 461, in run_method
(EngineCore_DP0 pid=13022) return func(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
(EngineCore_DP0 pid=13022) return func(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 328, in determine_available_memory
(EngineCore_DP0 pid=13022) self.model_runner.profile_run()
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4544, in profile_run
(EngineCore_DP0 pid=13022) hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 121, in decorate_context
(EngineCore_DP0 pid=13022) return func(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4268, in _dummy_run
(EngineCore_DP0 pid=13022) outputs = self.model(
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 220, in __call__
(EngineCore_DP0 pid=13022) return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1777, in _wrapped_call_impl
(EngineCore_DP0 pid=13022) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1788, in _call_impl
(EngineCore_DP0 pid=13022) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gpt_oss.py", line 722, in forward
(EngineCore_DP0 pid=13022) return self.model(input_ids, positions, intermediate_tensors, inputs_embeds)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 504, in __call__
(EngineCore_DP0 pid=13022) with (
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_config_module.py", line 666, in __enter__
(EngineCore_DP0 pid=13022) prior[key] = config.__getattr__(key)
(EngineCore_DP0 pid=13022) ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=13022) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_config_module.py", line 388, in __getattr__
(EngineCore_DP0 pid=13022) raise AttributeError(f"{self.__name__}.{name} does not exist") from e
(EngineCore_DP0 pid=13022) AttributeError: torch._inductor.config.assume_32bit_indexing does not exist
Doesn’t happen with pytorch installed from cu130 wheels.