Hi Team
We are using Jetson Orin Nano 8gb with the latest jtepack version of 6.0 and we are trying to run the NanoVLM model for that we are foillowing the below steps:
-
Cloned the respository of dusty nv container in the link : GitHub - dusty-nv/jetson-containers: Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
-
After cloning it , ran the following command to install the packages bash jetson-containers/install.sh
-
Once its completed pulling the image of nanovlm using the command : jetson-containers run $(autotag nano_llm)
-
On completeion of above process , I am trying to run the command in jetson orin nano for downloading and inference the model as mentoined in this link NanoVLM - NVIDIA Jetson AI Lab (jetson-ai-lab.com) , tried only the first command with model [
VILA-2.7b
] , but iits getting killed automatically , below I am attaching the error
Error:
root@ubuntu:/home/orin/Downloads/jetson-containers# jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
Namespace(packages=[βnano_llmβ], prefer=[βlocalβ, βregistryβ, βbuildβ], disable=[ββ], user=βdustynvβ, output=β/tmp/autotagβ, quiet=False, verbose=False)
β L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2
β Finding compatible container image for [βnano_llmβ]
dustynv/nano_llm:24.5-r36.2.0
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/orin/Downloads/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 dustynv/nano_llm:24.5-r36.2.0 python3 -m nano_llm.chat --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. Use HF_HOME
instead.
** warnings.warn(**
Fetching 10 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10/10 [00:00<00:00, 42281.29it/s]
Fetching 12 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:00<00:00, 35951.18it/s]
10:40:21 | INFO | loading /data/models/huggingface/modelsβEfficient-Large-ModelβVILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
10:40:22 | INFO | running MLC quantization:
python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx256 --use-safetensors
Using path β/data/models/mlc/dist/models/VILA-2.7bβ for model βVILA-2.7bβ
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights⦠This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|ββ | 2/197 [00:03<04:12, 1.29s/tensors]Traceback (most recent call last): | 1/327 [00:03<16:47, 3.09s/tensors]
** File β/usr/lib/python3.10/runpy.pyβ, line 196, in _run_module_as_main**
** return _run_code(code, main_globals, None,**
** File β/usr/lib/python3.10/runpy.pyβ, line 86, in _run_code**
** exec(code, run_globals)**
** File β/opt/NanoLLM/nano_llm/chat/main.pyβ, line 30, in **
** model = NanoLLM.from_pretrained(**
** File β/opt/NanoLLM/nano_llm/nano_llm.pyβ, line 73, in from_pretrained**
** model = MLCModel(model_path, kwargs)
** File β/opt/NanoLLM/nano_llm/models/mlc.pyβ, line 59, in init**
** quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, kwargs)
** File β/opt/NanoLLM/nano_llm/models/mlc.pyβ, line 278, in quantize**
** subprocess.run(cmd, executable=β/bin/bashβ, shell=True, check=True) **
** File β/usr/lib/python3.10/subprocess.pyβ, line 526, in run**
** raise CalledProcessError(retcode, process.args,**
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx256 --use-safetensors β died with <Signals.SIGKILL: 9>.
Eventhough we have enough space in the device as mentioned below
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 915G 56G 813G 7% /
Can you help us on this issue
Regards
Gomathy Sankaran