No luck with Gemma 4 on Jetson Nano Super

cm7900 · April 4, 2026, 9:43pm

Hi guys - Are you running Gemma 4 locally on the Jetson Nano Super (JP 6.2.2)? This is the container i used:

docker run --rm -it --runtime nvidia --network host
-v ~/.cache/huggingface:/root/.cache/huggingface
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin

But when i try to serve the model, this is the models i used: Gemma 4 - a unsloth Collection . This is the error message:

jcm@ubuntu:~$ docker run --rm -it --runtime nvidia --network host
-v ~/.cache/huggingface:/root/.cache/huggingface

  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin

root@ubuntu:/# vllm serve unsloth/gemma-4-E4B-it-unsloth-bnb-4bit
/opt/venv/lib/python3.10/site-packages/transformers/utils/hub.py:110: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293]
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] █ █ █▄ ▄█
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.16.0rc2.dev479+g15d76f74e.d20260226
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] █▄█▀ █ █ █ █ model unsloth/gemma-4-E4B-it-unsloth-bnb-4bit
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:293]
(APIServer pid=22) INFO 04-04 21:42:58 [utils.py:229] non-default args: {‘model_tag’: ‘unsloth/gemma-4-E4B-it-unsloth-bnb-4bit’, ‘model’: ‘unsloth/gemma-4-E4B-it-unsloth-bnb-4bit’}
config.json: 6.41kB [00:00, 6.01MB/s]
(APIServer pid=22) Traceback (most recent call last):
(APIServer pid=22) File “/opt/venv/bin/vllm”, line 10, in
(APIServer pid=22) sys.exit(main())
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py”, line 73, in main
(APIServer pid=22) args.dispatch_function(args)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py”, line 112, in cmd
(APIServer pid=22) uvloop.run(run_server(args))
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/uvloop/init.py”, line 69, in run
(APIServer pid=22) return loop.run_until_complete(wrapper())
(APIServer pid=22) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=22) return await main
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 471, in run_server
(APIServer pid=22) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 490, in run_server_worker
(APIServer pid=22) async with build_async_engine_client(
(APIServer pid=22) File “/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py”, line 199, in aenter
(APIServer pid=22) return await anext(self.gen)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 96, in build_async_engine_client
(APIServer pid=22) async with build_async_engine_client_from_engine_args(
(APIServer pid=22) File “/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py”, line 199, in aenter
(APIServer pid=22) return await anext(self.gen)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py”, line 122, in build_async_engine_client_from_engine_args
(APIServer pid=22) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/engine/arg_utils.py”, line 1431, in create_engine_config
(APIServer pid=22) model_config = self.create_model_config()
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/vllm/engine/arg_utils.py”, line 1283, in create_model_config
(APIServer pid=22) return ModelConfig(
(APIServer pid=22) File “/opt/venv/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py”, line 121, in init
(APIServer pid=22) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=22) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=22) Value error, The checkpoint you are trying to load has model type gemma4 but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
(APIServer pid=22)
(APIServer pid=22) You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {‘model’: …rocessor_plugin’: None}), input_type=ArgsKwargs]
(APIServer pid=22) For further information visit Redirecting...

eric.jswang · April 5, 2026, 12:47pm

I have the exactly same issue on Jetpack 6.2 Jetson Nano Super, and also expecting any helps. Thanks !

eric.jswang · April 6, 2026, 2:45am

By attaching to the container of current image (ghcr.io/ nvidia-ai-iot/ vllm : latest-jetson-orin), it is found that there is no gemma4 support files under directory /opt/venv/lib/python3.10/site-packages/vllm/model_executor/models/. So, it looks that gemma4 is, in fact, not supported yet in current latest-jetson-orin image. The version of vllm used in that image is 0.16.0rc2.dev479+g15d76f74e.d20260226.cu126.

Topic		Replies	Views
Gemma4 e4b on Jetson Orin Nano fails due to CUDA out of memory issue Jetson Orin Nano jetson , llama	0	32	April 5, 2026
Jetson Orin Nano Super: Error Running Gemma 3 4B Model Jetson Orin Nano generative_ai	8	872	April 2, 2025
Gemma3:4b not using the gpu while gemma3:1b does on orin Jetson Nano super Jetson Orin Nano generative_ai , llama	2	638	June 2, 2025
Nano_LLM or nanollm for Python package? Jetson Orin Nano generative_ai , llama	8	292	May 15, 2025
Gemma 3 and Gemma 3n on Jetson Orin Nano Super Jetson Orin Nano generative_ai	6	1230	January 3, 2026
MiniGPT-4 on Jetson Orin Nano 8Gb Dev kit not working Jetson Orin Nano generative_ai	9	600	May 28, 2024
vLLM on Jetson Orin Nano Super Jetson Orin Nano generative_ai	4	1259	September 18, 2025
Can't run NanoVLM on Jetson Orin NX 16GB Jetson Orin NX generative_ai	4	341	May 16, 2024
Running NanoLLM Docker on Jetson Orin Nano FileNotFoundError Jetson Orin Nano generative_ai , llama	5	339	April 9, 2025
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	2	241	January 13, 2025

No luck with Gemma 4 on Jetson Nano Super

Related topics