Hello,
I am trying to use nanoLLM on a Jetson Orin Nano 8GB (JetPack 6.1) with the following command:
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm.chat --api=mlc \
--model Efficient-Large-Model/VILA-2.7b \
--max-context-len 256 \
--max-new-tokens 32
However, I encounter the following error:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/chat/__main__.py", line 32, in <module>
model = NanoLLM.from_pretrained(
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
model = MLCModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in __init__
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 276, in quantize
subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b/ctx256 --use-safetensors ' died with <Signals.SIGKILL: 9>.
According to the tutorial provided on Jetson AI Lab, the VILA-2.7b model should work on Jetson Orin Nano.
When looking at the Jetson Power GUI, I see that the memory usage has reached 8GB. Is this a problem?
I would greatly appreciate any insights or solutions to resolve this issue. Thank you in advance for your assistance!