I am new to Jetson Orin Nano and this is my first time testing out this kit. I’ve followed the guide NanoLLM - NVIDIA Jetson AI Lab every step but now stuck with starting NanoLLM CLI using this command:
jetson-containers run
–env HUGGINGFACE_TOKEN=hf_abc123def
$(autotag nano_llm)
python3 -m nano_llm.chat --api mlc
–model meta-llama/Meta-Llama-3-8B-Instruct
–prompt “Can you tell me a joke about llamas?”
docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/itadmin/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250210_092047 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api mlc --model meta-llama/Meta-Llama-3-8B-Instruct --prompt ‘Can you tell me a joke about llamas?’
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /data/models/huggingface/token
Login successful
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Fetching 13 files: 100%|███████████████████████| 13/13 [00:00<00:00, 663.16it/s]
Fetching 17 files: 100%|██████████████████████| 17/17 [00:00<00:00, 5800.31it/s]
09:21:06 | INFO | loading /data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a with MLC
09:21:10 | INFO | NumExpr defaulting to 6 threads.
09:21:10 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[‘/data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/mlc-chat-config.json’, ‘/data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/params/mlc-chat-config.json’]
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/NanoLLM/nano_llm/chat/main.py”, line 32, in
model = NanoLLM.from_pretrained(
File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 91, in from_pretrained
model = MLCModel(model_path, **kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 60, in init
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 260, in quantize
os.symlink(model, model_path, target_is_directory=True)
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a’ → ‘/data/models/mlc/dist/models/Meta-Llama-3-8B-Instruct’
Would appreciate some help in getting this working, what should I do?
I also want to add that I noticed the data folder is not located in the jetson-containers directory as described in the guide. Somehow it ended up in the /ssd/data with all the models downloaded. I am not sure if this is causing the issue.
I’ve disabled the Graphical Interface and non essential services according to the article. I have also changed to a smaller model “TinyLlama/TinyLlama-1.1B-Chat-v1.0” The error persist:
jetson-containers run --env HUGGINGFACE_TOKEN=* $(autotag nano_llm) python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt “Can you tell me a joke about llamas?”
Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:
docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/itadmin/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250218_114302 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt ‘Can you tell me a joke about llamas?’
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /data/models/huggingface/token
Login successful
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
generation_config.json: 100%|██████████████████| 124/124 [00:00<00:00, 45.0kB/s]
config.json: 100%|██████████████████████████████| 608/608 [00:00<00:00, 520kB/s]
README.md: 100%|███████████████████████████| 3.20k/3.20k [00:00<00:00, 5.33MB/s]
.gitattributes: 100%|██████████████████████| 1.52k/1.52k [00:00<00:00, 3.90MB/s]
eval_results.json: 100%|███████████████████████| 566/566 [00:00<00:00, 1.55MB/s]
tokenizer_config.json: 100%|███████████████| 1.29k/1.29k [00:00<00:00, 3.49MB/s]
special_tokens_map.json: 100%|█████████████████| 551/551 [00:00<00:00, 1.55MB/s]
tokenizer.model: 100%|███████████████████████| 500k/500k [00:00<00:00, 16.9MB/s]
tokenizer.json: 100%|██████████████████████| 1.84M/1.84M [00:00<00:00, 2.31MB/s]
Fetching 9 files: 100%|███████████████████████████| 9/9 [00:01<00:00, 5.27it/s]
model.safetensors: 100%|███████████████████| 2.20G/2.20G [00:57<00:00, 38.1MB/s]
Fetching 10 files: 100%|████████████████████████| 10/10 [00:58<00:00, 5.83s/it]
11:44:14 | INFO | loading /data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6 with MLC
11:44:18 | INFO | NumExpr defaulting to 6 threads.
11:44:18 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
[‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/mlc-chat-config.json’, ‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/params/mlc-chat-config.json’]
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/NanoLLM/nano_llm/chat/main.py”, line 32, in
model = NanoLLM.from_pretrained(
File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 91, in from_pretrained
model = MLCModel(model_path, **kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 60, in init
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 260, in quantize
os.symlink(model, model_path, target_is_directory=True)
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6’ → ‘/data/models/mlc/dist/models/TinyLlama-1.1B-Chat-v1.0’
itadmin@Jetson-Nano:~$ jetson-containers run --env HUGGINGFACE_TOKEN=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 $(autotag nano_llm) python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt “Can you tell me a joke about llamas?”
Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:
docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/itadmin/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250218_114558 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt ‘Can you tell me a joke about llamas?’
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /data/models/huggingface/token
Login successful
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 20186.49it/s]
Fetching 10 files: 100%|██████████████████████| 10/10 [00:00<00:00, 5719.77it/s]
11:46:11 | INFO | loading /data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6 with MLC
11:46:15 | INFO | NumExpr defaulting to 6 threads.
11:46:15 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
[‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/mlc-chat-config.json’, ‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/params/mlc-chat-config.json’]
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/NanoLLM/nano_llm/chat/main.py”, line 32, in
model = NanoLLM.from_pretrained(
File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 91, in from_pretrained
model = MLCModel(model_path, **kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 60, in init
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 260, in quantize
os.symlink(model, model_path, target_is_directory=True)
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6’ → ‘/data/models/mlc/dist/models/TinyLlama-1.1B-Chat-v1.0’
It looks like both models are not downloaded correctly.
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a’ → ‘/data/models/mlc/dist/models/Meta-Llama-3-8B-Instruct’
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6’ → ‘/data/models/mlc/dist/models/TinyLlama-1.1B-Chat-v1.0’
Could you check if you can download the model first?
I had this issue and the fix was to ensure you have an pre-existing MLC models folder. Apparently if the folder doesn’t exists it fails quantizing outright.
In my case the required folder was ‘/data/models/mlc/dist/models’
After creating this folder the initialization performed as expected.