Hi guys - Are you running Gemma 4 locally on the Jetson Nano Super (JP 6.2.2)? This is the container i used:
docker run --rm -it --runtime nvidia --network host
-v ~/.cache/huggingface:/root/.cache/huggingface
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
But when i try to serve the model, this is the models i used: Gemma 4 - a unsloth Collection . This is the error message:
jcm@ubuntu:~$ docker run --rm -it --runtime nvidia --network host
-v ~/.cache/huggingface:/root/.cache/huggingface
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
root@ubuntu:/# vllm serve unsloth/gemma-4-E4B-it-unsloth-bnb-4bit
/opt/venv/lib/python3.10/site-packages/transformers/utils/hub.py:110: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293]
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] █ █ █▄ ▄█
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.16.0rc2.dev479+g15d76f74e.d20260226
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] █▄█▀ █ █ █ █ model unsloth/gemma-4-E4B-it-unsloth-bnb-4bit
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293]
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:229] non-default args: {‘model_tag’: ‘unsloth/gemma-4-E4B-it-unsloth-bnb-4bit’, ‘model’: ‘unsloth/gemma-4-E4B-it-unsloth-bnb-4bit’}
config.json: 6.41kB [00:00, 6.01MB/s]
(APIServer pid=22) Traceback (most recent call last):
(APIServer pid=22) File “/opt/venv/bin/vllm”, line 10, in
(APIServer pid=22) sys.exit(main())
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py”, line 73, in main
(APIServer pid=22) args.dispatch_function(args)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py”, line 112, in cmd
(APIServer pid=22) uvloop.run(run_server(args))
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/uvloop/init.py”, line 69, in run
(APIServer pid=22) return loop.run_until_complete(wrapper())
(APIServer pid=22) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=22) return await main
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 471, in run_server
(APIServer pid=22) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 490, in run_server_worker
(APIServer pid=22) async with build_async_engine_client(
(APIServer pid=22) File “/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py”, line 199, in aenter
(APIServer pid=22) return await anext(self.gen)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 96, in build_async_engine_client
(APIServer pid=22) async with build_async_engine_client_from_engine_args(
(APIServer pid=22) File “/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py”, line 199, in aenter
(APIServer pid=22) return await anext(self.gen)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 122, in build_async_engine_client_from_engine_args
(APIServer pid=22) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/engine/arg_utils.py”, line 1431, in create_engine_config
(APIServer pid=22) model_config = self.create_model_config()
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/engine/arg_utils.py”, line 1283, in create_model_config
(APIServer pid=22) return ModelConfig(
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py”, line 121, in init
(APIServer pid=22) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=22) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=22) Value error, The checkpoint you are trying to load has model type gemma4 but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
(APIServer pid=22)
(APIServer pid=22) You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {‘model’: …rocessor_plugin’: None}), input_type=ArgsKwargs]
(APIServer pid=22) For further information visit Redirecting...