Live Llava on Orin

dusty_nv · February 1, 2024, 5:52pm

New demo of Jetson Orin running LLaVA vision-language models on live video streams! This multimodal pipeline has been optimized with 4-bit quantization and tuned CUDA kernels to achieve interactive latency onboard edge devices. Try it yourself with the tutorial on Jetson AI Lab!

Next up will be extracting constrained JSON output from Llava and using it to trigger user-promptable alerts/actions for always-on applications.

YouTube: https://www.youtube.com/watch?v=X-OXxPiUTuU
Jetson AI Lab: Live LLaVA 🆕 - NVIDIA Jetson Generative AI Lab
Jetson Containers: jetson-containers/packages/llm/local_llm at master · dusty-nv/jetson-containers · GitHub

linuxdev · February 1, 2024, 5:59pm

Rather cool stuff. Now…attach voice synthesis and you have great software for the visually impaired.

blanc9 · February 22, 2024, 6:46am

Cool ! do you have any plans to integrate this into mmj?

dusty_nv · February 22, 2024, 3:26pm

Thanks @blanc9, yes!, we are currently working to integrate this optimized VLM pipeline into Metropolis Microservices - stay tuned.

doup0018 · March 5, 2024, 11:01am

when I test it ,it’s wrong display this:
python3 -m local_llm.agents.video_query --api=mlc --verbose
–model /data/models/text-generation-webui/llava-v1.5-7b
–max-new-tokens 32
–video-input /dev/video1
–video-output webrtc://@:8554/output
–prompt “Describe the image concisely.”
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
10:57:10 | DEBUG | Namespace(model=‘/data/models/text-generation-webui/llava-v1.5-7b’, quant=None, api=‘mlc’, vision_model=None, prompt=[‘Describe the image concisely.’], save_mermaid=None, chat_template=None, system_prompt=None, max_new_tokens=32, min_new_tokens=-1, do_sample=False, temperature=0.7, top_p=0.95, repetition_penalty=1.0, video_input=‘/dev/video1’, video_input_width=None, video_input_height=None, video_input_codec=None, video_input_framerate=None, video_input_save=None, video_output=‘webrtc://@:8554/output’, video_output_codec=None, video_output_bitrate=None, video_output_save=None, log_level=‘debug’, debug=True)
10:57:10 | DEBUG | subprocess 108 started
10:57:10 | INFO | loading /data/models/text-generation-webui/llava-v1.5-7b with MLC
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 128, in
agent = VideoQuery(**vars(args)).run()
File “/opt/local_llm/local_llm/agents/video_query.py”, line 23, in init
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 31, in init
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg[‘status’]})")
RuntimeError: subprocess has an invalid initialization status (<class ‘tvm._ffi.base.TVMError’>)
Traceback (most recent call last):
File “/usr/lib/python3.10/multiprocessing/process.py”, line 314, in _bootstrap
self.run()
File “/usr/lib/python3.10/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 62, in run_process
raise error
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 59, in run_process
plugin = factory(**kwargs)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 23, in
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/chat_query.py”, line 63, in init
self.model = LocalLM.from_pretrained(model, **kwargs)
File “/opt/local_llm/local_llm/local_llm.py”, line 72, in from_pretrained
model = MLCModel(model_path, *kwargs)
File “/opt/local_llm/local_llm/models/mlc.py”, line 72, in init
logging.info(f"device={self.device}, name={self.device.device_name}, compute={self.device.compute_version}, max_clocks={self.device.max_clock_rate}, multiprocessors={self.device.multi_processor_count}, max_thread_dims={self.device.max_thread_dimensions}, api_version={self.device.api_version}, driver_version={self.device.driver_version}")
File “/usr/local/lib/python3.10/dist-packages/tvm/_ffi/runtime_ctypes.py”, line 403, in device_name
return self._GetDeviceAttr(self.device_type, self.device_id, 5)
File “/usr/local/lib/python3.10/dist-packages/tvm/_ffi/runtime_ctypes.py”, line 303, in _GetDeviceAttr
return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
File “tvm/_ffi/_cython/./packed_func.pxi”, line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File “tvm/_ffi/_cython/./packed_func.pxi”, line 263, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 252, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 182, in tvm._ffi._cy3.core.CHECK_CALL
File “/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py”, line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (5) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(TVMFuncCall+0x68) [0xfffec70dd798]
[bt] (4) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x303d34c) [0xfffec70dd34c]
[bt] (3) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue)+0xd28) [0xfffec72075c8]
[bt] (2) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x316452c) [0xfffec720452c]
[bt] (1) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x68) [0xfffec52c6508]
[bt] (0) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::Backtraceabi:cxx11+0x30) [0xfffec7125380]
File “/opt/mlc-llm/3rdparty/tvm/src/runtime/cuda/cuda_device_api.cc”, line 73
CUDAError: cuDeviceGetName(&name[0], name.size(), dev.device_id) failed with error: CUDA_ERROR_NOT_INITIALIZED

dusty_nv · March 5, 2024, 6:14pm

Hi @doup0018 , I would recommend running this on JetPack 6 instead, or see this post for workaround:

github.com/dusty-nv/jetson-containers

AssertionError in TVM CUDA Initialization

opened 10:38AM - 12 Feb 24 UTC

doruksonmez

Hi, I'm just trying test out Live LLaVA using the following command: ``` ./…run.sh \ -e SSL_KEY=/data/key.pem -e SSL_CERT=/data/cert.pem \ $(./autotag local_llm) \ python3 -m local_llm.agents.video_query --api=mlc --verbose \ --model liuhaotian/llava-v1.5-7b \ --max-new-tokens 32 \ --video-input /dev/video0 \ --video-output display://0 \ --prompt "How many fingers am I holding up?" ``` However, it throws the following error from `local_llm.agents.video_query` module about some assertion error: ``` /usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( 09:41:03 | DEBUG | Namespace(api='mlc', chat_template=None, debug=True, do_sample=False, log_level='debug', max_new_tokens=32, min_new_tokens=-1, model='liuhaotian/llava-v1.5-7b', prompt=['How many fingers am I holding up?'], quant=None, repetition_penalty=1.0, save_mermaid=None, system_prompt=None, temperature=0.7, top_p=0.95, video_input='v4l2:///dev/video0', video_input_codec=None, video_input_framerate=None, video_input_height=None, video_input_save=None, video_input_width=None, video_output='display://0', video_output_bitrate=None, video_output_codec=None, video_output_save=None, vision_model=None) 09:41:03 | DEBUG | subprocess 694 started 09:41:03 | DEBUG | RUN_PROCESS GIRDI... 09:41:03 | DEBUG | Starting new HTTPS connection (1): huggingface.co:443 09:41:03 | DEBUG | https://huggingface.co:443 "GET /api/models/liuhaotian/llava-v1.5-7b/revision/main HTTP/1.1" 200 2276 Fetching 11 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 71089.90it/s] 09:41:03 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.5-7b/snapshots/12e054b30e8e061f423c7264bc97d4248232e965 with MLC Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/opt/local_llm/local_llm/agents/video_query.py", line 115, in <module> agent = VideoQuery(**vars(args)).run() File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in __init__ self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs) File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 34, in __init__ Traceback (most recent call last): raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg['status']})") File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() RuntimeError: subprocess has an invalid initialization status (<class 'AssertionError'>) File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 66, in run_process raise error File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 63, in run_process plugin = factory(**kwargs) File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in <lambda> self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs) File "/opt/local_llm/local_llm/plugins/chat_query.py", line 63, in __init__ self.model = LocalLM.from_pretrained(model, **kwargs) File "/opt/local_llm/local_llm/local_llm.py", line 72, in from_pretrained model = MLCModel(model_path, **kwargs) File "/opt/local_llm/local_llm/models/mlc.py", line 58, in __init__ assert(self.device.exist) # this is needed to initialize CUDA? AssertionError ``` What would be the reason for this error? Thanks.

yash.tamil.5 · March 6, 2024, 9:50am

jetson@jetson:~/jetson-containers$ sudo ./run.sh “$(./autotag local_llm)”

python3 -m local_llm.agents.video_query --api=mlc

–model NousResearch/Obsidian-3B-V0.5

–max-content-length 768

–max-new-tokens 32

–video-input rtsp://admin:?transmode=unicast&profile=vam

–video-output rtsp://localhost:1234/output --prompt

[3] 39679

bash: --video-output: command not found

Hope you could help me with this?

dusty_nv · March 6, 2024, 2:52pm

@yash.tamil.5 either you are missing the \ line-continuation characters in your multi-line bash command, or try surrounding your input stream URL in single quotes

dusty_nv · March 9, 2024, 4:18am

This has been updated with web UI, new model support, and vector database:

jasonthenderson · March 12, 2024, 3:14am

I just tried out the live llava demo using the command prompt below and while the terminal is writing out what the camera is seeing, but I am not getting the webpage to load the video stream… URL is https://127.0.0.1:8050.

I have tried both running on the device, as well as remotely via the hostname. I’m using Chromium on the Orin, and Edge on my Windows device.

Do I need to click any button to get the stream going in the browser?

Using JP6 DP, Orin AGX.

./run.sh $(./autotag local_llm)
python3 -m local_llm.agents.video_query --api=mlc
–model Efficient-Large-Model/VILA-2.7b
–max-context-len 768
–max-new-tokens 32
–video-input /dev/video0
–video-output webrtc://@:8554/output

Terminal is writing out this:

A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:34 | INFO | refresh rate: 1.34 FPS (747.9 ms)

A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:34 | INFO | refresh rate: 1.34 FPS (745.7 ms)

A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:35 | INFO | refresh rate: 1.26 FPS (794.9 ms)
A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:36 | INFO | refresh rate: 1.35 FPS (743.2 ms)

dusty_nv · March 12, 2024, 2:02pm

Hi @jasonthenderson, can you try setting your chrome://flags#enable-webrtc-hide-local-ips-with-mdns to Disabled? If that doesn’t work, please look in the Jetson’s console right after your browser connects, and inspect your browser’s debug console (Ctrl+Shift+I)

yash.tamil.5 · March 13, 2024, 8:50am

I was not able to fix the issue, I moved on to run other VLM (SAM, LLAVA-13b-5GB) models in my Orin Developer Kit. I am facing torch and torch vision dependency issues. I can find and install torch that is compatible with cuda, but when I try to install torchvision, torch automatically removes torch-cuda and install torch cpu. Can you help me with this. I am trying to run this in VS code in my AGX Orin and I don’t want to run a docker cointainer.

Running-
Jetpack 6.0
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Thank you.

dusty_nv · March 13, 2024, 2:11pm

@yash.tamil.5 without using the containers, you’ll need to build torchvision from source like shown under the Installation section of this post:

The issues you mention of other packages uninstalling the desired torch version/ect are some of the reasons why I use the containers for the package environment. You can use the containers for development by just mounting in a shared directory that contains all your code. Then you have your code editable from outside the container, while inside the container all the GPU-accelerated packages you want are in there working & tested.

yash.tamil.5 · March 15, 2024, 9:57am

The pip wheel works only for installing torch, installing torchvision will downgrade or change torch to CPU version, but I will try mounting my directory to the docker container and run the code.

Thank you very much for the help.

dusty_nv · March 15, 2024, 4:01pm

@yash.tamil.5 don’t pip install torchvision, see the instructions for building it from source in that thread I linked to above under the Installation section

jasonthenderson · March 16, 2024, 1:50am

Changing the Chrome setting to ‘disabled’ fixed the issue - thank you @dusty_nv !