I used the playbook to “Install and Use vLLM for Inference” but i am not able to get it to work on two Sparks. I can run vLLM on the main Spark node, but as soon as I attempt to run across the NCCL it fails. I keep getting messages that my version of Pytorch and Triton does not support SM 12.1.
I am using the docker container specified in the playbook and I have also tried the latest one too. Both fail the same.
The NCCL is working fine.
I have set the gpu memory usage down as it seems like it is not recognizing all of the memory across both Sparks?
Here is the log I am getting when just trying to run a small model across the two Sparks:
docker exec -it node vllm serve ibm-granite/granite-4.0-h-350m --tensor-parallel-size 2 --max_model_len 2048 --gpu-memory-utilization 0.4
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 92, in init
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py”, line 85, in determine_available_memory
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self.collective_rpc(“determine_available_memory”)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 312, in collective_rpc
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py”, line 505, in _run_workers
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py”, line 22, in auto_init_wrapper
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return fn(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py”, line 104, in wrapper
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 2882, in get
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 968, in get_objects
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise value.as_instanceof_cause()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ray.exceptions.RayTaskError: ray::RayWorkerWrapper.execute_method() (pid=504, ip=192.168.6.64, actor_id=3929d272544a9428368520e502000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0xec8811f83f20>)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 276, in execute_method
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise e
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 267, in execute_method
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 263, in determine_available_memory
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self.model_runner.profile_run()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3392, in profile_run
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] = self._dummy_run(self.max_num_tokens, is_profile=True)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3152, in _dummy_run
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] outputs = self.model(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py”, line 121, in call
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1784, in _call_impl
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/granitemoehybrid.py”, line 599, in forward
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py”, line 310, in call
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py”, line 749, in compile_wrapper
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 923, in _compile_fx_inner
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise InductorError(e, currentframe()).with_traceback(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 907, in _compile_fx_inner
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mb_compiled_graph = fx_codegen_and_compile(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1578, in fx_codegen_and_compile
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1456, in codegen_and_compile
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] compiled_module = graph.compile_to_module()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2293, in compile_to_module
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] return self._compile_to_module()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2303, in _compile_to_module
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mod = self._compile_to_module_lines(wrapper_code)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2371, in _compile_to_module_lines
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mod = PyCodeCache.load_by_key_path(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/codecache.py”, line 3296, in load_by_key_path
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/compile_tasks.py”, line 31, in _reload_python_module
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] exec(code, mod.dict, mod.dict)
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/root/.cache/vllm/torch_compile_cache/040a8cdd5f/rank_0_0/inductor_cache/gh/cghg6cq6mdxkky2ahfca5bglg5bb6yk4xotw4fm5r4xfjmmkhuj7.py”, line 67, in
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton(‘triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0’, ‘’’
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/async_compile.py”, line 404, in triton
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] kernel.precompile(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 408, in precompile
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] self._precompile_worker()
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 434, in _precompile_worker
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] raise NoTritonConfigsError(
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] torch._inductor.exc.InductorError: NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ptxas stderr:
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] ptxas fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708]
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] Repro command: /usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_121a /tmp/tmpo34r9kip.ptx -o /tmp/tmpo34r9kip.ptx.o
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708]
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708]
(EngineCore_DP0 pid=1727) ERROR 10-29 17:03:56 [core.py:708] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you’re reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS=“+dynamo”
(EngineCore_DP0 pid=1727) Process EngineCore_DP0:
(EngineCore_DP0 pid=1727) Traceback (most recent call last):
(EngineCore_DP0 pid=1727) File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_DP0 pid=1727) self.run()
(EngineCore_DP0 pid=1727) File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
(EngineCore_DP0 pid=1727) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 712, in run_engine_core
(EngineCore_DP0 pid=1727) raise e
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=1727) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=1727) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 92, in init
(EngineCore_DP0 pid=1727) self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=1727) self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py”, line 85, in determine_available_memory
(EngineCore_DP0 pid=1727) return self.collective_rpc(“determine_available_memory”)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 312, in collective_rpc
(EngineCore_DP0 pid=1727) return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py”, line 505, in _run_workers
(EngineCore_DP0 pid=1727) ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py”, line 22, in auto_init_wrapper
(EngineCore_DP0 pid=1727) return fn(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py”, line 104, in wrapper
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 2882, in get
(EngineCore_DP0 pid=1727) values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py”, line 968, in get_objects
(EngineCore_DP0 pid=1727) raise value.as_instanceof_cause()
(EngineCore_DP0 pid=1727) ray.exceptions.RayTaskError: ray::RayWorkerWrapper.execute_method() (pid=504, ip=192.168.6.64, actor_id=3929d272544a9428368520e502000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0xec8811f83f20>)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 276, in execute_method
(EngineCore_DP0 pid=1727) raise e
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py”, line 267, in execute_method
(EngineCore_DP0 pid=1727) return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 263, in determine_available_memory
(EngineCore_DP0 pid=1727) self.model_runner.profile_run()
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3392, in profile_run
(EngineCore_DP0 pid=1727) = self._dummy_run(self.max_num_tokens, is_profile=True)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 120, in decorate_context
(EngineCore_DP0 pid=1727) return func(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 3152, in _dummy_run
(EngineCore_DP0 pid=1727) outputs = self.model(
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py”, line 121, in call
(EngineCore_DP0 pid=1727) return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=1727) return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1784, in _call_impl
(EngineCore_DP0 pid=1727) return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/granitemoehybrid.py”, line 599, in forward
(EngineCore_DP0 pid=1727) hidden_states = self.model(input_ids, positions, intermediate_tensors,
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py”, line 310, in call
(EngineCore_DP0 pid=1727) output = self.compiled_callable(*args, **kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py”, line 749, in compile_wrapper
(EngineCore_DP0 pid=1727) raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 923, in _compile_fx_inner
(EngineCore_DP0 pid=1727) raise InductorError(e, currentframe()).with_traceback(
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 907, in _compile_fx_inner
(EngineCore_DP0 pid=1727) mb_compiled_graph = fx_codegen_and_compile(
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1578, in fx_codegen_and_compile
(EngineCore_DP0 pid=1727) return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py”, line 1456, in codegen_and_compile
(EngineCore_DP0 pid=1727) compiled_module = graph.compile_to_module()
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2293, in compile_to_module
(EngineCore_DP0 pid=1727) return self._compile_to_module()
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2303, in _compile_to_module
(EngineCore_DP0 pid=1727) mod = self._compile_to_module_lines(wrapper_code)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py”, line 2371, in _compile_to_module_lines
(EngineCore_DP0 pid=1727) mod = PyCodeCache.load_by_key_path(
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/codecache.py”, line 3296, in load_by_key_path
(EngineCore_DP0 pid=1727) mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/compile_tasks.py”, line 31, in _reload_python_module
(EngineCore_DP0 pid=1727) exec(code, mod.dict, mod.dict)
(EngineCore_DP0 pid=1727) File “/root/.cache/vllm/torch_compile_cache/040a8cdd5f/rank_0_0/inductor_cache/gh/cghg6cq6mdxkky2ahfca5bglg5bb6yk4xotw4fm5r4xfjmmkhuj7.py”, line 67, in
(EngineCore_DP0 pid=1727) triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton(‘triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0’, ‘’’
(EngineCore_DP0 pid=1727) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/async_compile.py”, line 404, in triton
(EngineCore_DP0 pid=1727) kernel.precompile(
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 408, in precompile
(EngineCore_DP0 pid=1727) self._precompile_worker()
(EngineCore_DP0 pid=1727) File “/usr/local/lib/python3.12/dist-packages/torch/_inductor/runtime/triton_heuristics.py”, line 434, in _precompile_worker
(EngineCore_DP0 pid=1727) raise NoTritonConfigsError(
(EngineCore_DP0 pid=1727) torch._inductor.exc.InductorError: NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(EngineCore_DP0 pid=1727) ptxas stderr:
(EngineCore_DP0 pid=1727) ptxas fatal : Value ‘sm_121a’ is not defined for option ‘gpu-name’
(EngineCore_DP0 pid=1727)
(EngineCore_DP0 pid=1727) Repro command: /usr/local/lib/python3.12/dist-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_121a /tmp/tmpo34r9kip.ptx -o /tmp/tmpo34r9kip.ptx.o
(EngineCore_DP0 pid=1727)
(EngineCore_DP0 pid=1727)