Hi,
I am following the tutorial at https://www.jetson-ai-lab.com/tutorial_live-llava.html. And when I run the command
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm.agents.video_query --api=mlc \
--model Efficient-Large-Model/VILA-2.7b \
--max-context-len 768 \
--max-new-tokens 32 \
--video-input /dev/video0 \
--video-output webrtc://@:8554/output
I am getting error
dustynv/nano_llm:r36.2.0
localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist
+ sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/jetsonano/Documents/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 dustynv/nano_llm:r36.2.0 python3 -m nano_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Fetching 10 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 10/10 [00:00<00:00, 64527.75it/s]
Fetching 12 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:00<00:00, 49587.83it/s]
06:36:25 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
06:36:27 | INFO | running MLC quantization:
python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx4096 --use-safetensors
Using path "/data/models/mlc/dist/models/VILA-2.7b" for model "VILA-2.7b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|ββ | 2/197 [00:03<04:08, 1.27s/tensors]Process Process-1:%|β | 1/327 [00:03<16:31, 3.04s/tensors]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 358, in <module>
agent = VideoQuery(**vars(args)).run()
File "/opt/NanoLLM/nano_llm/agents/video_query.py", line 44, in __init__
self.llm = ProcessProxy('ChatQuery', model=model, drop_inputs=True, vision_scaling=vision_scaling, **kwargs) #ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File "/opt/NanoLLM/nano_llm/plugins/process_proxy.py", line 38, in __init__
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg['status']})")
RuntimeError: subprocess has an invalid initialization status (<class 'subprocess.CalledProcessError'>)
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/NanoLLM/nano_llm/plugins/process_proxy.py", line 132, in run_process
raise error
File "/opt/NanoLLM/nano_llm/plugins/process_proxy.py", line 126, in run_process
self.plugin = ChatQuery(**kwargs)
File "/opt/NanoLLM/nano_llm/plugins/chat_query.py", line 70, in __init__
self.model = NanoLLM.from_pretrained(model, **kwargs)
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 71, in from_pretrained
model = MLCModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 59, in __init__
quant = MLCModel.quantize(model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 278, in quantize
subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx4096 --use-safetensors ' died with <Signals.SIGKILL: 9>.
Below is my nvidia-jetpack info
sudo apt-cache show nvidia-jetpack
[sudo] password for jetsonano:
Package: nvidia-jetpack
Version: 6.0-b52
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.0-b52), nvidia-jetpack-dev (= 6.0-b52)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.0-b52_arm64.deb
Size: 29294
SHA256: 01f3cfaed6f45ebabacbe5f2d4c3b74a296200ae928d68b97956470d54c4be98
SHA1: 950626b2b51381650e8ecb7e3b21f5e2e89cddb6
MD5sum: 1e58b6faa4b7a9695a1f5b0cb6035d85
Description: NVIDIA Jetpack Meta Package
Can someone please help me?