I pulled the following container: nvcr.io/nvidia/vllm:25.12-py3. The run command is as follows:
sudo docker run --platform linux/arm64 --runtime=nvidia -itd
–name qwen_vl_service
–gpus all
-p 8000:8000
-v /home/cv/LM/qwen3:/model
-v /home/cv/AI/qWen3-VL-Server:/app
–ipc=host
–ulimit memlock=-1
–ulimit stack=67108864
–env TORCH_CUDA_ARCH_LIST=“11.0a”
–env VLLM_WORKER_MULTIPROC_METHOD=“fork”
–env TRITON_PTXAS_PATH=“/usr/local/cuda/bin/ptxas”
–env PATH=“/usr/local/cuda/bin:$PATH”
–env LD_LIBRARY_PATH=“/usr/local/cuda/lib64:$LD_LIBRARY_PATH”
nvcr.io/nvidia/vllm:25.12-py3
bash -c "
vllm serve /model
–trust-remote-code
–dtype bfloat16
–gpu-memory-utilization 0.55
–enforce-eager
–disable-log-stats
–port 8000
–skip-mm-profiling
–limit-mm-per-prompt ‘{“image”:5,“video”:0}’
"
It has been verified that vLLM can load the model normally and perform pure text inference, but it crashes when performing image inference. The core error is:
(APIServer pid=144) (EngineCore_DP0 pid=183) RuntimeError: GET was unable to find an engine to execute this computation.
Detailed startup and runtime logs are as follows:cv@cv:~/AI/qWen3-VL-Server$ sudo docker logs -f qwen_vl_service
==========
== vLLM ==
NVIDIA Release 25.12 (build 245720122)
vLLM Version 0.11.1+9114fd76
Container image Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
and the Product-Specific Terms for NVIDIA AI Products
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 580.00 which has support for CUDA 13.0. This container
was built with CUDA 13.1 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
See CUDA Compatibility — CUDA Compatibility for details.
/usr/local/lib/python3.12/dist-packages/torchvision/io/image.py:14: UserWarning: Failed to load image Python extension: ‘Could not load this library: /usr/local/lib/python3.12/dist-packages/torchvision/image.so’If you don’t plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
INFO 01-06 01:29:54 [scheduler.py:216] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=144) INFO 01-06 01:29:54 [api_server.py:1980] vLLM API server version 0.11.1+9114fd76.nv25.12
(APIServer pid=144) INFO 01-06 01:29:54 [utils.py:253] non-default args: {‘model_tag’: ‘/model’, ‘model’: ‘/model’, ‘trust_remote_code’: True, ‘dtype’: ‘bfloat16’, ‘enforce_eager’: True, ‘gpu_memory_utilization’: 0.55, ‘limit_mm_per_prompt’: {‘image’: 5, ‘video’: 0}, ‘skip_mm_profiling’: True, ‘disable_log_stats’: True}
(APIServer pid=144) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=144) INFO 01-06 01:29:59 [model.py:631] Resolved architecture: Qwen3VLForConditionalGeneration
(APIServer pid=144) INFO 01-06 01:29:59 [model.py:1745] Using max model len 262144
(APIServer pid=144) INFO 01-06 01:29:59 [scheduler.py:216] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=144) INFO 01-06 01:29:59 [vllm.py:500] Cudagraph is disabled under eager mode
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:00 [core.py:93] Initializing a V1 LLM engine (v0.11.1+9114fd76.nv25.12) with config: model=’/model’, speculative_config=None, tokenizer=‘/model’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’, reasoning_parser_plugin=‘’, enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/model, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={‘level’: None, ‘mode’: <CompilationMode.NONE: 0>, ‘debug_dump_path’: None, ‘cache_dir’: ‘’, ‘compile_cache_save_format’: ‘binary’, ‘backend’: ‘inductor’, ‘custom_ops’: [‘all’], ‘splitting_ops’: None, ‘compile_mm_encoder’: False, ‘use_inductor’: None, ‘compile_sizes’: , ‘inductor_compile_config’: {‘enable_auto_functionalized_v2’: False, ‘combo_kernels’: True, ‘benchmark_combo_kernel’: True}, ‘inductor_passes’: {}, ‘cudagraph_mode’: <CUDAGraphMode.NONE: 0>, ‘cudagraph_num_of_warmups’: 0, ‘cudagraph_capture_sizes’: , ‘cudagraph_copy_inputs’: False, ‘cudagraph_specialize_lora’: True, ‘use_inductor_graph_partition’: False, ‘pass_config’: {}, ‘max_cudagraph_capture_size’: 0, ‘local_cache_dir’: None}
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:00 [parallel_state.py:1208] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.17.0.2:60861 backend=nccl
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:00 [parallel_state.py:1394] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:08 [gpu_model_runner.py:3255] Starting to load model /model…
(APIServer pid=144) (EngineCore_DP0 pid=183) /usr/local/lib/python3.12/dist-packages/torch/library.py:356: UserWarning: Warning only once for all operators, other operators may also be overridden.
(APIServer pid=144) (EngineCore_DP0 pid=183) Overriding a previously registered kernel for the same operator and the same dispatch key
(APIServer pid=144) (EngineCore_DP0 pid=183) operator: flash_attn::_flash_attn_backward(Tensor dout, Tensor q, Tensor k, Tensor v, Tensor out, Tensor softmax_lse, Tensor(a6!)? dq, Tensor(a7!)? dk, Tensor(a8!)? dv, float dropout_p, float softmax_scale, bool causal, SymInt window_size_left, SymInt window_size_right, float softcap, Tensor? alibi_slopes, bool deterministic, Tensor? rng_state=None) → Tensor
(APIServer pid=144) (EngineCore_DP0 pid=183) registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922
(APIServer pid=144) (EngineCore_DP0 pid=183) dispatch key: ADInplaceOrView
(APIServer pid=144) (EngineCore_DP0 pid=183) previous kernel: no debug info
(APIServer pid=144) (EngineCore_DP0 pid=183) new kernel: registered at /usr/local/lib/python3.12/dist-packages/torch/_library/custom_ops.py:922 (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:208.)
(APIServer pid=144) (EngineCore_DP0 pid=183) self.m.impl(
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) WARNING 01-06 01:30:08 [qwen2_5_vl.py:366] Flash attention backend requires head_dim to be a multiple of 32, but got 72. Falling back to TORCH_SDPA backend.
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:24 [cuda.py:418] Valid backends: [‘FLASH_ATTN’, ‘FLASHINFER’, ‘TRITON_ATTN’, ‘FLEX_ATTENTION’]
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:24 [cuda.py:427] Using FLASH_ATTN backend.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:04, 1.59s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:03<00:03, 1.66s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:04<00:01, 1.37s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.47s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:05<00:00, 1.49s/it]
(APIServer pid=144) (EngineCore_DP0 pid=183)
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:31 [default_loader.py:314] Loading weights took 6.10 seconds
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:31 [gpu_model_runner.py:3334] Model loading took 16.6397 GiB memory and 22.955288 seconds
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:32 [gpu_model_runner.py:4067] Skipping memory profiling for multimodal encoder and encoder cache.
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:34 [gpu_worker.py:359] Available KV cache memory: 46.32 GiB
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:34 [kv_cache_utils.py:1229] GPU KV cache size: 337,312 tokens
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:34 [kv_cache_utils.py:1234] Maximum concurrency for 262,144 tokens per request: 1.29x
(APIServer pid=144) (EngineCore_DP0 pid=183) 2026-01-06 01:30:35,902 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts …
(APIServer pid=144) (EngineCore_DP0 pid=183) 2026-01-06 01:30:35,930 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:36 [core.py:250] init engine (profile, create kv cache, warmup model) took 4.60 seconds
(APIServer pid=144) (EngineCore_DP0 pid=183) INFO 01-06 01:30:41 [vllm.py:500] Cudagraph is disabled under eager mode
(APIServer pid=144) INFO 01-06 01:30:41 [api_server.py:1728] Supported tasks: [‘generate’]
(APIServer pid=144) WARNING 01-06 01:30:41 [model.py:1568] Default sampling parameters have been overridden by the model’s Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with --generation-config vllm.
(APIServer pid=144) INFO 01-06 01:30:41 [serving_responses.py:154] Using default chat sampling params from model: {‘temperature’: 0.7, ‘top_k’: 20, ‘top_p’: 0.8}
(APIServer pid=144) INFO 01-06 01:30:41 [serving_chat.py:131] Using default chat sampling params from model: {‘temperature’: 0.7, ‘top_k’: 20, ‘top_p’: 0.8}
(APIServer pid=144) INFO 01-06 01:30:41 [serving_completion.py:73] Using default completion sampling params from model: {‘temperature’: 0.7, ‘top_k’: 20, ‘top_p’: 0.8}
(APIServer pid=144) INFO 01-06 01:30:41 [serving_chat.py:131] Using default chat sampling params from model: {‘temperature’: 0.7, ‘top_k’: 20, ‘top_p’: 0.8}
(APIServer pid=144) INFO 01-06 01:30:41 [api_server.py:2055] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:38] Available routes are:
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=144) INFO 01-06 01:30:41 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=144) INFO: Started server process [144]
(APIServer pid=144) INFO: Waiting for application startup.
(APIServer pid=144) INFO: Application startup complete.
(APIServer pid=144) INFO 01-06 01:30:50 [chat_utils.py:557] Detected the chat template content format to be ‘openai’. You can set --chat-template-content-format to override this.
(APIServer pid=144) INFO: 10.19.120.171:51819 - “POST /v1/chat/completions HTTP/1.1” 200 OK
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:72] Dumping input data for V1 LLM engine (v0.11.1+9114fd76.nv25.12) with config: model=‘/model’, speculative_config=None, tokenizer=‘/model’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’, reasoning_parser_plugin=‘’, enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/model, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={‘level’: None, ‘mode’: <CompilationMode.NONE: 0>, ‘debug_dump_path’: None, ‘cache_dir’: ‘’, ‘compile_cache_save_format’: ‘binary’, ‘backend’: ‘inductor’, ‘custom_ops’: [‘all’], ‘splitting_ops’: None, ‘compile_mm_encoder’: False, ‘use_inductor’: None, ‘compile_sizes’: , ‘inductor_compile_config’: {‘enable_auto_functionalized_v2’: False, ‘combo_kernels’: True, ‘benchmark_combo_kernel’: True}, ‘inductor_passes’: {}, ‘cudagraph_mode’: <CUDAGraphMode.NONE: 0>, ‘cudagraph_num_of_warmups’: 0, ‘cudagraph_capture_sizes’: , ‘cudagraph_copy_inputs’: False, ‘cudagraph_specialize_lora’: True, ‘use_inductor_graph_partition’: False, ‘pass_config’: {}, ‘max_cudagraph_capture_size’: 0, ‘local_cache_dir’: None},
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-d3947879b8574b228701fbba28997aaa,prompt_token_ids_len=4773,mm_features=[MultiModalFeatureSpec(data={‘image_grid_thw’: MultiModalFieldElem(modality=‘image’, key=‘image_grid_thw’, data=tensor([ 1, 244, 78]), field=MultiModalBatchedField()), ‘pixel_values’: MultiModalFieldElem(modality=‘image’, key=‘pixel_values’, data=tensor([[-0.9062, -0.9062, -0.9062, …, 0.6484, 0.6484, 0.6484],
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] [-0.9062, -0.9062, -0.9062, …, 0.6328, 0.6328, 0.6328],
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] [-0.8984, -0.9141, -0.8984, …, 0.6328, 0.6484, 0.6328],
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] …,
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] [-0.5938, -0.5938, -0.5938, …, 0.6016, 0.6016, 0.6016],
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] [-0.5938, -0.5938, -0.5938, …, 0.6016, 0.6016, 0.6016],
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] [-0.5938, -0.5938, -0.5938, …, 0.6016, 0.6016, 0.6016]],
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [dump_input.py:79] dtype=torch.bfloat16), field=MultiModalFlatField(slices=[[slice(0, 19032, None)]], dim=0))}, modality=‘image’, identifier=‘cd8838b8a6eef80b2311a56f147b1a3afe3a998f8089fac62e4e3824ab88db15’, mm_position=PlaceholderRange(offset=9, length=4758, is_embed=None))],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, seed=None, stop=, stop_token_ids=[151643], bad_words=, include_stop_str_in_output=False, ignore_eos=False, max_tokens=256, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, structured_outputs=None, extra_args=None),block_ids=([6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133],),num_computed_tokens=0,lora_request=None,prompt_embeds_shape=None)], scheduled_cached_reqs=CachedRequestData(req_ids=, resumed_req_ids=, new_token_ids=, all_token_ids={}, new_block_ids=, num_computed_tokens=, num_output_tokens=), num_scheduled_tokens={chatcmpl-d3947879b8574b228701fbba28997aaa: 2048}, total_num_scheduled_tokens=2048, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={chatcmpl-d3947879b8574b228701fbba28997aaa: [0]}, num_common_prefix_blocks=[128], finished_req_ids=, free_encoder_mm_hashes=, pending_structured_output_tokens=false, kv_connector_metadata=null, ec_connector_metadata=null)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] EngineCore encountered a fatal error.
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] Traceback (most recent call last):
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 835, in run_engine_core
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] engine_core.run_busy_loop()
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 862, in run_busy_loop
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] self._process_engine_step()
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 891, in _process_engine_step
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] outputs, model_executed = self.step_fn()
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 342, in step
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] model_output = future.result()
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/lib/python3.12/concurrent/futures/_base.py”, line 449, in result
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self.__get_result()
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/lib/python3.12/concurrent/futures/_base.py”, line 401, in __get_result
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] raise self._exception
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py”, line 79, in collective_rpc
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] result = run_method(self.driver_worker, method, args, kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py”, line 479, in run_method
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return func(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py”, line 367, in execute_model
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self.worker.execute_model(scheduler_output, *args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 124, in decorate_context
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return func(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 563, in execute_model
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] output = self.model_runner.execute_model(
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 124, in decorate_context
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return func(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 2753, in execute_model
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ) = self._preprocess(
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 2328, in _preprocess
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] self._execute_mm_encoder(scheduler_output)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 1975, in _execute_mm_encoder
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] curr_group_outputs = model.embed_multimodal(**mm_kwargs_group)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 1496, in embed_multimodal
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] image_embeddings = self._process_image_input(multimodal_input)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 1375, in _process_image_input
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] image_embeds = self.visual(pixel_values, grid_thw=grid_thw)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1783, in _wrapped_call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self._call_impl(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1794, in _call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return forward_call(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 540, in forward
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] hidden_states = self.patch_embed(hidden_states)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1783, in _wrapped_call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self._call_impl(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1794, in _call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return forward_call(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 151, in forward
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] x = self.proj(x).view(L, self.hidden_size)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1783, in _wrapped_call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self._call_impl(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1794, in _call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return forward_call(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py”, line 46, in forward
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self._forward_method(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/conv.py”, line 236, in forward_cuda
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] return self._forward_conv(x)
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/conv.py”, line 210, in _forward_conv
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] x = F.conv3d(
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] ^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) ERROR 01-06 01:31:06 [core.py:844] RuntimeError: GET was unable to find an engine to execute this computation
(APIServer pid=144) (EngineCore_DP0 pid=183) Process EngineCore_DP0:
(APIServer pid=144) (EngineCore_DP0 pid=183) Traceback (most recent call last):
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] AsyncLLM output_handler failed.
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] Traceback (most recent call last):
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 477, in output_handler
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] outputs = await engine_core.get_output_async()
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 883, in get_output_async
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] raise self._format_exception(outputs) from None
(APIServer pid=144) ERROR 01-06 01:31:06 [async_llm.py:525] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(APIServer pid=144) (EngineCore_DP0 pid=183) self.run()
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
(APIServer pid=144) (EngineCore_DP0 pid=183) self._target(*self._args, **self._kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 846, in run_engine_core
(APIServer pid=144) (EngineCore_DP0 pid=183) raise e
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 835, in run_engine_core
(APIServer pid=144) (EngineCore_DP0 pid=183) engine_core.run_busy_loop()
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 862, in run_busy_loop
(APIServer pid=144) (EngineCore_DP0 pid=183) self._process_engine_step()
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 891, in _process_engine_step
(APIServer pid=144) (EngineCore_DP0 pid=183) outputs, model_executed = self.step_fn()
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 342, in step
(APIServer pid=144) (EngineCore_DP0 pid=183) model_output = future.result()
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/lib/python3.12/concurrent/futures/_base.py”, line 449, in result
(APIServer pid=144) (EngineCore_DP0 pid=183) return self.__get_result()
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/lib/python3.12/concurrent/futures/_base.py”, line 401, in __get_result
(APIServer pid=144) (EngineCore_DP0 pid=183) raise self._exception
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py”, line 79, in collective_rpc
(APIServer pid=144) (EngineCore_DP0 pid=183) result = run_method(self.driver_worker, method, args, kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py”, line 479, in run_method
(APIServer pid=144) (EngineCore_DP0 pid=183) return func(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py”, line 367, in execute_model
(APIServer pid=144) (EngineCore_DP0 pid=183) return self.worker.execute_model(scheduler_output, *args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 124, in decorate_context
(APIServer pid=144) (EngineCore_DP0 pid=183) return func(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 563, in execute_model
(APIServer pid=144) (EngineCore_DP0 pid=183) output = self.model_runner.execute_model(
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py”, line 124, in decorate_context
(APIServer pid=144) (EngineCore_DP0 pid=183) return func(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 2753, in execute_model
(APIServer pid=144) (EngineCore_DP0 pid=183) ) = self._preprocess(
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 2328, in _preprocess
(APIServer pid=144) (EngineCore_DP0 pid=183) self._execute_mm_encoder(scheduler_output)
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 1975, in _execute_mm_encoder
(APIServer pid=144) (EngineCore_DP0 pid=183) curr_group_outputs = model.embed_multimodal(**mm_kwargs_group)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 1496, in embed_multimodal
(APIServer pid=144) (EngineCore_DP0 pid=183) image_embeddings = self._process_image_input(multimodal_input)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 1375, in _process_image_input
(APIServer pid=144) (EngineCore_DP0 pid=183) image_embeds = self.visual(pixel_values, grid_thw=grid_thw)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1783, in _wrapped_call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) return self._call_impl(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1794, in _call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) return forward_call(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 540, in forward
(APIServer pid=144) (EngineCore_DP0 pid=183) hidden_states = self.patch_embed(hidden_states)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1783, in _wrapped_call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) return self._call_impl(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1794, in _call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) return forward_call(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py”, line 151, in forward
(APIServer pid=144) (EngineCore_DP0 pid=183) x = self.proj(x).view(L, self.hidden_size)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1783, in _wrapped_call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) return self._call_impl(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py”, line 1794, in _call_impl
(APIServer pid=144) (EngineCore_DP0 pid=183) return forward_call(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py”, line 46, in forward
(APIServer pid=144) (EngineCore_DP0 pid=183) return self._forward_method(*args, **kwargs)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/conv.py”, line 236, in forward_cuda
(APIServer pid=144) (EngineCore_DP0 pid=183) return self._forward_conv(x)
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/conv.py”, line 210, in _forward_conv
(APIServer pid=144) (EngineCore_DP0 pid=183) x = F.conv3d(
(APIServer pid=144) (EngineCore_DP0 pid=183) ^^^^^^^^^
(APIServer pid=144) (EngineCore_DP0 pid=183) RuntimeError: GET was unable to find an engine to execute this computation
(APIServer pid=144) INFO: 10.19.120.171:51823 - “POST /v1/chat/completions HTTP/1.1” 500 Internal Server Error
(APIServer pid=144) INFO: Shutting down
(APIServer pid=144) INFO: Waiting for application shutdown.
(APIServer pid=144) INFO: Application shutdown complete.
(APIServer pid=144) INFO: Finished server process [144]