Hi all,
I am trying to run the Live LLAVA demo application.
Following are the details about my deployment environment.
I am using a Orin Nx 16 gb device.
I have 2 queries.
./run.sh $(./autotag local_llm) \
python3 -m local_llm.agents.video_query --api=mlc \
--model Efficient-Large-Model/VILA-2.7b \
--max-context-len 768 \
--max-new-tokens 32 \
--video-input /dev/video0 \
--video-output webrtc://@:8554/output
This is the command I executed. But I am facing an error :
However when I remove the --max-context-len 768 from the command the application manages to start.
But eventually ,It throws the error:
Attaching the complete log here for reference:
./run.sh $(./autotag local_llm) python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
Namespace(disable=[‘’], output=‘/tmp/autotag’, packages=[‘local_llm’], prefer=[‘local’, ‘registry’, ‘build’], quiet=False, user=‘dustynv’, verbose=False)
– L4T_VERSION=35.3.1 JETPACK_VERSION=5.1.1 CUDA_VERSION=11.4.315
– Finding compatible container image for [‘local_llm’]
dustynv/local_llm:r35.3.1
localuser:root being added to access control list
- sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/nvidia/Downloads/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth dustynv/local_llm:r35.3.1 python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
Fetching 10 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 35696.20it/s]
10:49:38 | INFO | loading /data/models/huggingface/models–Efficient-Large-Model–VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
globbing /data/models/mlc/dist/models/VILA-2.7b/*.safetensors
glob
10:49:38 | INFO | running MLC quantization:
python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist
Using path “/data/models/mlc/dist/models/VILA-2.7b” for model “VILA-2.7b”
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights… This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|▋ | 1/197 [00:02<06:35, 2.02s/tensors]Traceback (most recent call last):
File “/usr/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py”, line 47, in
main()
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py”, line 43, in main
core.build_model_from_args(parsed_args)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/core.py”, line 884, in build_model_from_args
params = utils.convert_weights(mod_transform, param_manager, params, args)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/utils.py”, line 286, in convert_weights
vm"transform_params"
File “tvm/_ffi/_cython/./packed_func.pxi”, line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File “tvm/_ffi/_cython/./packed_func.pxi”, line 263, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 252, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 182, in tvm._ffi._cy3.core.CHECK_CALL
File “/usr/local/lib/python3.8/dist-packages/tvm/_ffi/base.py”, line 481, in raise_last_ffi_error
raise py_err
File “tvm/_ffi/_cython/./packed_func.pxi”, line 56, in tvm._ffi._cy3.core.tvm_callback
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/utils.py”, line 46, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py”, line 599, in get_item
load_torch_params_from_bin(torch_binname)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py”, line 559, in load_torch_params_from_bin
torch_params = torch.load(
File “/usr/local/lib/python3.8/dist-packages/torch/serialization.py”, line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.8/dist-packages/torch/serialization.py”, line 1172, in _load
result = unpickler.load()
File “/usr/local/lib/python3.8/dist-packages/torch/serialization.py”, line 1165, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named ‘llava’
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 115, in
agent = VideoQuery(**vars(args)).run()
File “/opt/local_llm/local_llm/agents/video_query.py”, line 22, in init
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 31, in init
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg[‘status’]})")
RuntimeError: subprocess has an invalid initialization status (<class ‘subprocess.CalledProcessError’>)
Traceback (most recent call last):
File “/usr/lib/python3.8/multiprocessing/process.py”, line 315, in _bootstrap
self.run()
File “/usr/lib/python3.8/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 62, in run_process
raise error
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 59, in run_process
plugin = factory(**kwargs)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 22, in
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/chat_query.py”, line 63, in init
self.model = LocalLM.from_pretrained(model, **kwargs)
File “/opt/local_llm/local_llm/local_llm.py”, line 72, in from_pretrained
model = MLCModel(model_path, **kwargs)
File “/opt/local_llm/local_llm/models/mlc.py”, line 51, in init
quant = MLCModel.quantize(model_path, self.config, quant, **kwargs)
File “/opt/local_llm/local_llm/models/mlc.py”, line 194, in quantize
subprocess.run(cmd, executable=‘/bin/bash’, shell=True, check=True)
File “/usr/lib/python3.8/subprocess.py”, line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist ’ returned non-zero exit status 1.
Can someone help me here?