Running LLAVA live on Jetson orin nx(16 GB) with nvidia jetpack 5.1.1

akshayss1614 · March 12, 2024, 10:56am

Hi all,

I am trying to run the Live LLAVA demo application.

Following are the details about my deployment environment.

I am using a Orin Nx 16 gb device.

I have 2 queries.

./run.sh $(./autotag local_llm) \
  python3 -m local_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA-2.7b \
    --max-context-len 768 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output webrtc://@:8554/output

This is the command I executed. But I am facing an error :

However when I remove the --max-context-len 768 from the command the application manages to start.
But eventually ,It throws the error:

Attaching the complete log here for reference:

./run.sh $(./autotag local_llm) python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
Namespace(disable=[‘’], output=‘/tmp/autotag’, packages=[‘local_llm’], prefer=[‘local’, ‘registry’, ‘build’], quiet=False, user=‘dustynv’, verbose=False)
– L4T_VERSION=35.3.1 JETPACK_VERSION=5.1.1 CUDA_VERSION=11.4.315
– Finding compatible container image for [‘local_llm’]
dustynv/local_llm:r35.3.1
localuser:root being added to access control list

sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/nvidia/Downloads/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth dustynv/local_llm:r35.3.1 python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
Fetching 10 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 35696.20it/s]
10:49:38 | INFO | loading /data/models/huggingface/models–Efficient-Large-Model–VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
globbing /data/models/mlc/dist/models/VILA-2.7b/*.safetensors
glob
10:49:38 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist

Using path “/data/models/mlc/dist/models/VILA-2.7b” for model “VILA-2.7b”
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights… This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|▋ | 1/197 [00:02<06:35, 2.02s/tensors]Traceback (most recent call last):
File “/usr/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py”, line 47, in
main()
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py”, line 43, in main
core.build_model_from_args(parsed_args)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/core.py”, line 884, in build_model_from_args
params = utils.convert_weights(mod_transform, param_manager, params, args)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/utils.py”, line 286, in convert_weights
vm"transform_params"
File “tvm/_ffi/_cython/./packed_func.pxi”, line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File “tvm/_ffi/_cython/./packed_func.pxi”, line 263, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 252, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 182, in tvm._ffi._cy3.core.CHECK_CALL
File “/usr/local/lib/python3.8/dist-packages/tvm/_ffi/base.py”, line 481, in raise_last_ffi_error
raise py_err
File “tvm/_ffi/_cython/./packed_func.pxi”, line 56, in tvm._ffi._cy3.core.tvm_callback
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/utils.py”, line 46, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py”, line 599, in get_item
load_torch_params_from_bin(torch_binname)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/param_manager.py”, line 559, in load_torch_params_from_bin
torch_params = torch.load(
File “/usr/local/lib/python3.8/dist-packages/torch/serialization.py”, line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.8/dist-packages/torch/serialization.py”, line 1172, in _load
result = unpickler.load()
File “/usr/local/lib/python3.8/dist-packages/torch/serialization.py”, line 1165, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named ‘llava’
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 115, in
agent = VideoQuery(**vars(args)).run()
File “/opt/local_llm/local_llm/agents/video_query.py”, line 22, in init
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 31, in init
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg[‘status’]})")
RuntimeError: subprocess has an invalid initialization status (<class ‘subprocess.CalledProcessError’>)
Traceback (most recent call last):
File “/usr/lib/python3.8/multiprocessing/process.py”, line 315, in _bootstrap
self.run()
File “/usr/lib/python3.8/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 62, in run_process
raise error
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 59, in run_process
plugin = factory(**kwargs)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 22, in
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/chat_query.py”, line 63, in init
self.model = LocalLM.from_pretrained(model, **kwargs)
File “/opt/local_llm/local_llm/local_llm.py”, line 72, in from_pretrained
model = MLCModel(model_path, **kwargs)
File “/opt/local_llm/local_llm/models/mlc.py”, line 51, in init
quant = MLCModel.quantize(model_path, self.config, quant, **kwargs)
File “/opt/local_llm/local_llm/models/mlc.py”, line 194, in quantize
subprocess.run(cmd, executable=‘/bin/bash’, shell=True, check=True)
File “/usr/lib/python3.8/subprocess.py”, line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist ’ returned non-zero exit status 1.

Can someone help me here?

dusty_nv · March 12, 2024, 2:13pm

Hi @akshayss1614, you need to upgrade your Jetson to JetPack 6 to run the latest container and code for this demo (that requirement is listed on the page you linked to). I started needing Python 3.10 for the latest ML support to run the latest LLM models, and it was also a lot to keep it verified on both JetPack 5 and 6. Most of the newer stuff on Jetson AI Lab runs on JetPack 6 going forward.

akshayss1614 · March 13, 2024, 8:39am

Hi @dusty_nv ,
Thanks for your response.
What is the possibility of making it work on Jetpack 5.x? I have all other appplications already up and running in Jetpack 5.1.1. I would like to keep the other applications undisturbed for now.

Please suggest.

dusty_nv · March 13, 2024, 2:13pm

@akshayss1614 it’s not entirely impossible, while using Llama/Llava-based models on JetPack 5. I will have to give it a closer look after GTC and try it then.

akshayss1614 · March 21, 2024, 5:44pm

Hi @dusty_nv ,

I tried to run standalone scripts trying to install the llava package and other dependencies. But ended up not able to run it. Can you suggest any workaround ?

Topic		Replies	Views
Can't start the live llava on jetson orin nano developer kit Jetson Orin Nano generative_ai	9	797	June 4, 2024
Live Llava on Orin Jetson Projects generative_ai	20	2168	March 13, 2025
I want to try LLaVa with Jetson Orin Jetson AGX Orin generative_ai	5	933	March 10, 2024
Chat with Llava fails Jetson AGX Xavier generative_ai	3	28	March 6, 2025
NanoVLM Issue on Jetson Orin Nano Jetson Orin Nano generative_ai	9	678	June 6, 2024
Local_llm vs NanoLLM: Help Getting NanoLLM up & running Jetson Orin Nano generative_ai	7	1007	April 17, 2024
Cannot run LLaVa with Orin NX Jetson Orin NX generative_ai	7	318	August 1, 2024
Error on following "NanoVLM - Efficient Multimodal Pipeline" Jetson Orin Nano generative_ai	2	225	May 24, 2024
Memory exhausted when loading LLM and rebooted Jetson Nano Super Jetson Orin Nano generative_ai	3	144	January 24, 2025
Ollama and Jetson issue Jetson Orin NX jetson-inference , generative_ai	12	5349	March 20, 2024

Running LLAVA live on Jetson orin nx(16 GB) with nvidia jetpack 5.1.1

Related topics