Following this procedure Gemma 4 31B | Jetson AI Lab on nvidia jetson thor.
engine core initialisation is failing.
(LLM) olpeleri@olpeleri:~/LLM_playground$ docker run -it --rm --pull always --runtime=nvidia --network host -v $HOME/.cache/huggingface:/root/.cache/huggingface Package vllm · GitHub vllm serve nvidia/Gemma-4-31B-IT-NVFP4 --gpu-memory-utilization 0.8 --enable-auto-tool-choice --reasoning-parser gemma4 --tool-call-parser gemma4
gemma4-jetson-thor: Pulling from nvidia-ai-iot/vllm
Digest: sha256:570f9a5ffa89a772226abcc98c2d358a56ec3f755c97bc079c7f2396ffe62260
Status: Image is up to date for Package vllm · GitHub
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:299]
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:299] █ █ █▄ ▄█
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:299] █▄█▀ █ █ █ █ model nvidia/Gemma-4-31B-IT-NVFP4
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:299]
(APIServer pid=1) INFO 05-06 03:04:10 [utils.py:233] non-default args: {‘model_tag’: ‘nvidia/Gemma-4-31B-IT-NVFP4’, ‘enable_auto_tool_choice’: True, ‘tool_call_parser’: ‘gemma4’, ‘model’: ‘nvidia/Gemma-4-31B-IT-NVFP4’, ‘reasoning_parser’: ‘gemma4’, ‘gpu_memory_utilization’: 0.8}
(APIServer pid=1) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
config.json: 9.46kB [00:00, 11.4MB/s]
processor_config.json: 1.69kB [00:00, 3.49MB/s]
(APIServer pid=1) INFO 05-06 03:04:23 [model.py:549] Resolved architecture: Gemma4ForConditionalGeneration
(APIServer pid=1) INFO 05-06 03:04:23 [model.py:1678] Using max model len 262144
(APIServer pid=1) INFO 05-06 03:04:23 [cache.py:227] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
(APIServer pid=1) INFO 05-06 03:04:23 [config.py:104] Gemma4 model has heterogeneous head dimensions (head_dim=256, global_head_dim=512). Forcing TRITON_ATTN backend to prevent mixed-backend numerical divergence.
(APIServer pid=1) WARNING 05-06 03:04:23 [modelopt.py:998] Detected ModelOpt NVFP4 checkpoint. Please note that the format is experimental and could change in future.
(APIServer pid=1) INFO 05-06 03:04:23 [vllm.py:790] Asynchronous scheduling is enabled.
(APIServer pid=1) INFO 05-06 03:04:23 [compilation.py:290] Enabled custom fusions: act_quant
tokenizer_config.json: 2.09kB [00:00, 5.90MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32.2M/32.2M [00:02<00:00, 11.4MB/s]
chat_template.jinja: 16.9kB [00:00, 11.8MB/s]
generation_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 208/208 [00:00<00:00, 903kB/s]
(EngineCore pid=122) INFO 05-06 03:04:42 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model=‘nvidia/Gemma-4-31B-IT-NVFP4’, speculative_config=None, tokenizer=‘nvidia/Gemma-4-31B-IT-NVFP4’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=modelopt_fp4, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=fp8_e4m3, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘gemma4’, reasoning_parser_plugin=‘’, enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=nvidia/Gemma-4-31B-IT-NVFP4, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={‘mode’: <CompilationMode.VLLM_COMPILE: 3>, ‘debug_dump_path’: None, ‘cache_dir’: ‘’, ‘compile_cache_save_format’: ‘binary’, ‘backend’: ‘inductor’, ‘custom_ops’: [‘none’], ‘splitting_ops’: [‘vllm::unified_attention’, ‘vllm::unified_attention_with_output’, ‘vllm::unified_mla_attention’, ‘vllm::unified_mla_attention_with_output’, ‘vllm::mamba_mixer2’, ‘vllm::mamba_mixer’, ‘vllm::short_conv’, ‘vllm::linear_attention’, ‘vllm::plamo2_mamba_mixer’, ‘vllm::gdn_attention_core’, ‘vllm::olmo_hybrid_gdn_full_forward’, ‘vllm::kda_attention’, ‘vllm::sparse_attn_indexer’, ‘vllm::rocm_aiter_sparse_attn_indexer’, ‘vllm::unified_kv_cache_update’, ‘vllm::unified_mla_kv_cache_update’], ‘compile_mm_encoder’: False, ‘cudagraph_mm_encoder’: False, ‘encoder_cudagraph_token_budgets’: , ‘encoder_cudagraph_max_images_per_batch’: 0, ‘compile_sizes’: , ‘compile_ranges_endpoints’: [2048], ‘inductor_compile_config’: {‘enable_auto_functionalized_v2’: False, ‘size_asserts’: False, ‘alignment_asserts’: False, ‘scalar_asserts’: False, ‘combo_kernels’: True, ‘benchmark_combo_kernel’: True}, ‘inductor_passes’: {}, ‘cudagraph_mode’: <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, ‘cudagraph_num_of_warmups’: 1, ‘cudagraph_capture_sizes’: [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], ‘cudagraph_copy_inputs’: False, ‘cudagraph_specialize_lora’: True, ‘use_inductor_graph_partition’: False, ‘pass_config’: {‘fuse_norm_quant’: False, ‘fuse_act_quant’: True, ‘fuse_attn_quant’: False, ‘enable_sp’: False, ‘fuse_gemm_comms’: False, ‘fuse_allreduce_rms’: False}, ‘max_cudagraph_capture_size’: 512, ‘dynamic_shapes_config’: {‘type’: <DynamicShapesType.BACKED: ‘backed’>, ‘evaluate_guards’: False, ‘assume_32_bit_indexing’: False}, ‘local_cache_dir’: None, ‘fast_moe_cold_start’: True, ‘static_all_moe_layers’: }
(EngineCore pid=122) Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
(EngineCore pid=122) INFO 05-06 03:04:45 [parallel_state.py:1400] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.48.66.216:42697 backend=nccl
(EngineCore pid=122) INFO 05-06 03:04:45 [parallel_state.py:1716] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] EngineCore failed to start.
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 1082, in run_engine_core
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 848, in init
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] super().init(
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 114, in init
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] self.model_executor = executor_class(vllm_config)
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py”, line 103, in init
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] self._init_executor()
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py”, line 47, in _init_executor
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] self.driver_worker.init_device()
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py”, line 312, in init_device
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] self.worker.init_device() # type: ignore
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py”, line 283, in init_device
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] self.requested_memory = request_memory(init_snapshot, self.cache_config)
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] File “/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/utils.py”, line 413, in request_memory
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] raise ValueError(
(EngineCore pid=122) ERROR 05-06 03:04:46 [core.py:1108] ValueError: Free memory on device cuda:0 (80.29/122.82 GiB) on startup is less than desired GPU memory utilization (0.8, 98.26 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
(EngineCore pid=122) Process EngineCore:
(EngineCore pid=122) Traceback (most recent call last):
(EngineCore pid=122) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore pid=122) self.run()
(EngineCore pid=122) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/multiprocessing/process.py”, line 108, in run
(EngineCore pid=122) self._target(*self._args, **self._kwargs)
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 1112, in run_engine_core
(EngineCore pid=122) raise e
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 1082, in run_engine_core
(EngineCore pid=122) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(EngineCore pid=122) return func(*args, **kwargs)
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 848, in init
(EngineCore pid=122) super().init(
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py”, line 114, in init
(EngineCore pid=122) self.model_executor = executor_class(vllm_config)
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(EngineCore pid=122) return func(*args, **kwargs)
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py”, line 103, in init
(EngineCore pid=122) self._init_executor()
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py”, line 47, in _init_executor
(EngineCore pid=122) self.driver_worker.init_device()
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py”, line 312, in init_device
(EngineCore pid=122) self.worker.init_device() # type: ignore
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(EngineCore pid=122) return func(*args, **kwargs)
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py”, line 283, in init_device
(EngineCore pid=122) self.requested_memory = request_memory(init_snapshot, self.cache_config)
(EngineCore pid=122) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=122) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/worker/utils.py”, line 413, in request_memory
(EngineCore pid=122) raise ValueError(
(EngineCore pid=122) ValueError: Free memory on device cuda:0 (80.29/122.82 GiB) on startup is less than desired GPU memory utilization (0.8, 98.26 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
[rank0]:[W506 03:04:46.711161802 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see Redirecting… (function operator())
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File “/opt/venv/bin/vllm”, line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py”, line 75, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py”, line 122, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/uvloop/init.py”, line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/asyncio/runners.py”, line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/asyncio/runners.py”, line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 670, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 684, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 100, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/contextlib.py”, line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py”, line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py”, line 225, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py”, line 154, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 130, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/tracing/otel.py”, line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 887, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py”, line 535, in init
(APIServer pid=1) with launch_core_engines(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File “/root/.local/share/uv/python/cpython-3.12.13-linux-aarch64-gnu/lib/python3.12/contextlib.py”, line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py”, line 998, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File “/opt/venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py”, line 1057, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Any pointers to a solution?
Bonus point - Anyone tried Mistral 4 Small?