Running LLAVA live on Jetson orin nx(16 GB) with nvidia jetpack 5.1.1

Hi all,

I am trying to run the Live LLAVA demo application.

Following are the details about my deployment environment.

I am using a Orin Nx 16 gb device.

I have 2 queries.

./ $(./autotag local_llm) \
  python3 -m local_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA-2.7b \
    --max-context-len 768 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output webrtc://@:8554/output

This is the command I executed. But I am facing an error :

However when I remove the --max-context-len 768 from the command the application manages to start.
But eventually ,It throws the error:

Attaching the complete log here for reference:

./ $(./autotag local_llm) python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
Namespace(disable=[‘’], output=‘/tmp/autotag’, packages=[‘local_llm’], prefer=[‘local’, ‘registry’, ‘build’], quiet=False, user=‘dustynv’, verbose=False)
– Finding compatible container image for [‘local_llm’]
localuser:root being added to access control list

  • sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/nvidia/Downloads/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth dustynv/local_llm:r35.3.1 python3 -m local_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output
    Fetching 10 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 35696.20it/s]
    10:49:38 | INFO | loading /data/models/huggingface/models–Efficient-Large-Model–VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
    globbing /data/models/mlc/dist/models/VILA-2.7b/*.safetensors
    10:49:38 | INFO | running MLC quantization:

python3 -m --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist

Using path “/data/models/mlc/dist/models/VILA-2.7b” for model “VILA-2.7b”
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights… This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|▋ | 1/197 [00:02<06:35, 2.02s/tensors]Traceback (most recent call last):
File “/usr/lib/python3.8/”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.8/”, line 87, in _run_code
exec(code, run_globals)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/”, line 47, in
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/”, line 43, in main
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/”, line 884, in build_model_from_args
params = utils.convert_weights(mod_transform, param_manager, params, args)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/”, line 286, in convert_weights
File “tvm/_ffi/_cython/./packed_func.pxi”, line 332, in
File “tvm/_ffi/_cython/./packed_func.pxi”, line 263, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 252, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 182, in tvm._ffi._cy3.core.CHECK_CALL
File “/usr/local/lib/python3.8/dist-packages/tvm/_ffi/”, line 481, in raise_last_ffi_error
raise py_err
File “tvm/_ffi/_cython/./packed_func.pxi”, line 56, in tvm._ffi._cy3.core.tvm_callback
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/”, line 46, in inner
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/”, line 599, in get_item
File “/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/”, line 559, in load_torch_params_from_bin
torch_params = torch.load(
File “/usr/local/lib/python3.8/dist-packages/torch/”, line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File “/usr/local/lib/python3.8/dist-packages/torch/”, line 1172, in _load
result = unpickler.load()
File “/usr/local/lib/python3.8/dist-packages/torch/”, line 1165, in find_class
return super().find_class(mod_name, name)
ModuleNotFoundError: No module named ‘llava’
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.8/”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.8/”, line 87, in _run_code
exec(code, run_globals)
File “/opt/local_llm/local_llm/agents/”, line 115, in
agent = VideoQuery(**vars(args)).run()
File “/opt/local_llm/local_llm/agents/”, line 22, in init
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/”, line 31, in init
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg[‘status’]})")
RuntimeError: subprocess has an invalid initialization status (<class ‘subprocess.CalledProcessError’>)
Traceback (most recent call last):
File “/usr/lib/python3.8/multiprocessing/”, line 315, in _bootstrap
File “/usr/lib/python3.8/multiprocessing/”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/opt/local_llm/local_llm/plugins/”, line 62, in run_process
raise error
File “/opt/local_llm/local_llm/plugins/”, line 59, in run_process
plugin = factory(**kwargs)
File “/opt/local_llm/local_llm/agents/”, line 22, in
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/”, line 63, in init
self.model = LocalLM.from_pretrained(model, **kwargs)
File “/opt/local_llm/local_llm/”, line 72, in from_pretrained
model = MLCModel(model_path, **kwargs)
File “/opt/local_llm/local_llm/models/”, line 51, in init
quant = MLCModel.quantize(model_path, self.config, quant, **kwargs)
File “/opt/local_llm/local_llm/models/”, line 194, in quantize, executable=‘/bin/bash’, shell=True, check=True)
File “/usr/lib/python3.8/”, line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist ’ returned non-zero exit status 1.

Can someone help me here?

Hi @akshayss1614, you need to upgrade your Jetson to JetPack 6 to run the latest container and code for this demo (that requirement is listed on the page you linked to). I started needing Python 3.10 for the latest ML support to run the latest LLM models, and it was also a lot to keep it verified on both JetPack 5 and 6. Most of the newer stuff on Jetson AI Lab runs on JetPack 6 going forward.

Hi @dusty_nv ,
Thanks for your response.
What is the possibility of making it work on Jetpack 5.x? I have all other appplications already up and running in Jetpack 5.1.1. I would like to keep the other applications undisturbed for now.

Please suggest.

@akshayss1614 it’s not entirely impossible, while using Llama/Llava-based models on JetPack 5. I will have to give it a closer look after GTC and try it then.

Hi @dusty_nv ,

I tried to run standalone scripts trying to install the llava package and other dependencies. But ended up not able to run it. Can you suggest any workaround ?