Running NanoLLM Docker on Jetson Orin Nano FileNotFoundError

Dear Community,

I am new to Jetson Orin Nano and this is my first time testing out this kit. I’ve followed the guide NanoLLM - NVIDIA Jetson AI Lab every step but now stuck with starting NanoLLM CLI using this command:

jetson-containers run
–env HUGGINGFACE_TOKEN=hf_abc123def
$(autotag nano_llm)
python3 -m nano_llm.chat --api mlc
–model meta-llama/Meta-Llama-3-8B-Instruct
–prompt “Can you tell me a joke about llamas?”

I did replace the Token with my token.

This is the output and error I received:

Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:

  • docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/itadmin/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250210_092047 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api mlc --model meta-llama/Meta-Llama-3-8B-Instruct --prompt ‘Can you tell me a joke about llamas?’
    /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
    warnings.warn(
    The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
    Token is valid (permission: write).
    Your token has been saved to /data/models/huggingface/token
    Login successful
    /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
    warnings.warn(
    Fetching 13 files: 100%|███████████████████████| 13/13 [00:00<00:00, 663.16it/s]
    Fetching 17 files: 100%|██████████████████████| 17/17 [00:00<00:00, 5800.31it/s]
    09:21:06 | INFO | loading /data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a with MLC
    09:21:10 | INFO | NumExpr defaulting to 6 threads.
    09:21:10 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
    Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
    [‘/data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/mlc-chat-config.json’, ‘/data/models/mlc/dist/Meta-Llama-3-8B-Instruct/ctx8192/Meta-Llama-3-8B-Instruct-q4f16_ft/params/mlc-chat-config.json’]
    Traceback (most recent call last):
    File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
    exec(code, run_globals)
    File “/opt/NanoLLM/nano_llm/chat/main.py”, line 32, in
    model = NanoLLM.from_pretrained(
    File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 91, in from_pretrained
    model = MLCModel(model_path, **kwargs)
    File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 60, in init
    quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
    File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 260, in quantize
    os.symlink(model, model_path, target_is_directory=True)
    FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a’ → ‘/data/models/mlc/dist/models/Meta-Llama-3-8B-Instruct’

Would appreciate some help in getting this working, what should I do?

I also want to add that I noticed the data folder is not located in the jetson-containers directory as described in the guide. Somehow it ended up in the /ssd/data with all the models downloaded. I am not sure if this is causing the issue.

Thanks in advance.

Hi,

Suppose you are using OrinNano 8GB device, is that correct?
To optimize the memory, please run the command in the below link first:

As OrinNano only has 8GB memory, it might not have enough resources to load a 8B model.
Would you mind trying other smaller (weights <= 4B):

https://dusty-nv.github.io/NanoLLM/models.html#tested-models

Thanks.

I’ve disabled the Graphical Interface and non essential services according to the article. I have also changed to a smaller model “TinyLlama/TinyLlama-1.1B-Chat-v1.0” The error persist:

jetson-containers run --env HUGGINGFACE_TOKEN=* $(autotag nano_llm) python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt “Can you tell me a joke about llamas?”
Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
– Finding compatible container image for [‘nano_llm’]
dustynv/nano_llm:r36.4.0
V4L2_DEVICES:

  • docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/itadmin/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250218_114302 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt ‘Can you tell me a joke about llamas?’
    /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
    warnings.warn(
    The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
    Token is valid (permission: write).
    Your token has been saved to /data/models/huggingface/token
    Login successful
    /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
    warnings.warn(
    generation_config.json: 100%|██████████████████| 124/124 [00:00<00:00, 45.0kB/s]
    config.json: 100%|██████████████████████████████| 608/608 [00:00<00:00, 520kB/s]
    README.md: 100%|███████████████████████████| 3.20k/3.20k [00:00<00:00, 5.33MB/s]
    .gitattributes: 100%|██████████████████████| 1.52k/1.52k [00:00<00:00, 3.90MB/s]
    eval_results.json: 100%|███████████████████████| 566/566 [00:00<00:00, 1.55MB/s]
    tokenizer_config.json: 100%|███████████████| 1.29k/1.29k [00:00<00:00, 3.49MB/s]
    special_tokens_map.json: 100%|█████████████████| 551/551 [00:00<00:00, 1.55MB/s]
    tokenizer.model: 100%|███████████████████████| 500k/500k [00:00<00:00, 16.9MB/s]
    tokenizer.json: 100%|██████████████████████| 1.84M/1.84M [00:00<00:00, 2.31MB/s]
    Fetching 9 files: 100%|███████████████████████████| 9/9 [00:01<00:00, 5.27it/s]
    model.safetensors: 100%|███████████████████| 2.20G/2.20G [00:57<00:00, 38.1MB/s]
    Fetching 10 files: 100%|████████████████████████| 10/10 [00:58<00:00, 5.83s/it]
    11:44:14 | INFO | loading /data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6 with MLC
    11:44:18 | INFO | NumExpr defaulting to 6 threads.
    11:44:18 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
    [‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/mlc-chat-config.json’, ‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/params/mlc-chat-config.json’]
    Traceback (most recent call last):
    File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
    exec(code, run_globals)
    File “/opt/NanoLLM/nano_llm/chat/main.py”, line 32, in
    model = NanoLLM.from_pretrained(
    File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 91, in from_pretrained
    model = MLCModel(model_path, **kwargs)
    File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 60, in init
    quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
    File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 260, in quantize
    os.symlink(model, model_path, target_is_directory=True)
    FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6’ → ‘/data/models/mlc/dist/models/TinyLlama-1.1B-Chat-v1.0’
    itadmin@Jetson-Nano:~$ jetson-containers run --env HUGGINGFACE_TOKEN=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 $(autotag nano_llm) python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt “Can you tell me a joke about llamas?”
    Namespace(packages=[‘nano_llm’], prefer=[‘local’, ‘registry’, ‘build’], disable=[‘’], user=‘dustynv’, output=‘/tmp/autotag’, quiet=False, verbose=False)
    – L4T_VERSION=36.4.3 JETPACK_VERSION=6.2 CUDA_VERSION=12.6
    – Finding compatible container image for [‘nano_llm’]
    dustynv/nano_llm:r36.4.0
    V4L2_DEVICES:
  • docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/itadmin/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250218_114558 --env HUGGINGFACE_TOKEN=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 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.chat --api mlc --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt ‘Can you tell me a joke about llamas?’
    /usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
    warnings.warn(
    The token has not been saved to the git credentials helper. Pass add_to_git_credential=True in this function directly or --add-to-git-credential if using via huggingface-cli if you want to set the git credential as well.
    Token is valid (permission: write).
    Your token has been saved to /data/models/huggingface/token
    Login successful
    /usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
    warnings.warn(
    Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 20186.49it/s]
    Fetching 10 files: 100%|██████████████████████| 10/10 [00:00<00:00, 5719.77it/s]
    11:46:11 | INFO | loading /data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6 with MLC
    11:46:15 | INFO | NumExpr defaulting to 6 threads.
    11:46:15 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
    [‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/mlc-chat-config.json’, ‘/data/models/mlc/dist/TinyLlama-1.1B-Chat-v1.0/ctx2048/TinyLlama-1.1B-Chat-v1.0-q4f16_ft/params/mlc-chat-config.json’]
    Traceback (most recent call last):
    File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
    exec(code, run_globals)
    File “/opt/NanoLLM/nano_llm/chat/main.py”, line 32, in
    model = NanoLLM.from_pretrained(
    File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 91, in from_pretrained
    model = MLCModel(model_path, **kwargs)
    File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 60, in init
    quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
    File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 260, in quantize
    os.symlink(model, model_path, target_is_directory=True)
    FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6’ → ‘/data/models/mlc/dist/models/TinyLlama-1.1B-Chat-v1.0’

I am really stuck… Any help is appreciated.

1 Like

Hi,

Sorry for the late update.

It looks like both models are not downloaded correctly.

FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–meta-llama–Meta-Llama-3-8B-Instruct/snapshots/5f0b02c75b57c5855da9ae460ce51323ea669d8a’ → ‘/data/models/mlc/dist/models/Meta-Llama-3-8B-Instruct’
FileNotFoundError: [Errno 2] No such file or directory: ‘/data/models/huggingface/models–TinyLlama–TinyLlama-1.1B-Chat-v1.0/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6’ → ‘/data/models/mlc/dist/models/TinyLlama-1.1B-Chat-v1.0’

Could you check if you can download the model first?

Thanks.

I had this issue and the fix was to ensure you have an pre-existing MLC models folder. Apparently if the folder doesn’t exists it fails quantizing outright.

In my case the required folder was ‘/data/models/mlc/dist/models’

After creating this folder the initialization performed as expected.