Live Llava on Orin

New demo of Jetson Orin running LLaVA vision-language models on live video streams! This multimodal pipeline has been optimized with 4-bit quantization and tuned CUDA kernels to achieve interactive latency onboard edge devices. Try it yourself with the tutorial on Jetson AI Lab!

Next up will be extracting constrained JSON output from Llava and using it to trigger user-promptable alerts/actions for always-on applications.

YouTube: https://www.youtube.com/watch?v=X-OXxPiUTuU
Jetson AI Lab: Live LLaVA 🆕 - NVIDIA Jetson Generative AI Lab
Jetson Containers: jetson-containers/packages/llm/local_llm at master · dusty-nv/jetson-containers · GitHub

2 Likes

Rather cool stuff. Now…attach voice synthesis and you have great software for the visually impaired.

2 Likes

Cool ! do you have any plans to integrate this into mmj?

Thanks @blanc9, yes!, we are currently working to integrate this optimized VLM pipeline into Metropolis Microservices - stay tuned.

when I test it ,it’s wrong display this:
python3 -m local_llm.agents.video_query --api=mlc --verbose
–model /data/models/text-generation-webui/llava-v1.5-7b
–max-new-tokens 32
–video-input /dev/video1
–video-output webrtc://@:8554/output
–prompt “Describe the image concisely.”
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
10:57:10 | DEBUG | Namespace(model=‘/data/models/text-generation-webui/llava-v1.5-7b’, quant=None, api=‘mlc’, vision_model=None, prompt=[‘Describe the image concisely.’], save_mermaid=None, chat_template=None, system_prompt=None, max_new_tokens=32, min_new_tokens=-1, do_sample=False, temperature=0.7, top_p=0.95, repetition_penalty=1.0, video_input=‘/dev/video1’, video_input_width=None, video_input_height=None, video_input_codec=None, video_input_framerate=None, video_input_save=None, video_output=‘webrtc://@:8554/output’, video_output_codec=None, video_output_bitrate=None, video_output_save=None, log_level=‘debug’, debug=True)
10:57:10 | DEBUG | subprocess 108 started
10:57:10 | INFO | loading /data/models/text-generation-webui/llava-v1.5-7b with MLC
Process Process-1:
Traceback (most recent call last):
File “/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/usr/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 128, in
agent = VideoQuery(**vars(args)).run()
File “/opt/local_llm/local_llm/agents/video_query.py”, line 23, in init
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 31, in init
raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg[‘status’]})")
RuntimeError: subprocess has an invalid initialization status (<class ‘tvm._ffi.base.TVMError’>)
Traceback (most recent call last):
File “/usr/lib/python3.10/multiprocessing/process.py”, line 314, in _bootstrap
self.run()
File “/usr/lib/python3.10/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 62, in run_process
raise error
File “/opt/local_llm/local_llm/plugins/process_proxy.py”, line 59, in run_process
plugin = factory(**kwargs)
File “/opt/local_llm/local_llm/agents/video_query.py”, line 23, in
self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
File “/opt/local_llm/local_llm/plugins/chat_query.py”, line 63, in init
self.model = LocalLM.from_pretrained(model, **kwargs)
File “/opt/local_llm/local_llm/local_llm.py”, line 72, in from_pretrained
model = MLCModel(model_path, *kwargs)
File “/opt/local_llm/local_llm/models/mlc.py”, line 72, in init
logging.info(f"device={self.device}, name={self.device.device_name}, compute={self.device.compute_version}, max_clocks={self.device.max_clock_rate}, multiprocessors={self.device.multi_processor_count}, max_thread_dims={self.device.max_thread_dimensions}, api_version={self.device.api_version}, driver_version={self.device.driver_version}")
File “/usr/local/lib/python3.10/dist-packages/tvm/_ffi/runtime_ctypes.py”, line 403, in device_name
return self._GetDeviceAttr(self.device_type, self.device_id, 5)
File “/usr/local/lib/python3.10/dist-packages/tvm/_ffi/runtime_ctypes.py”, line 303, in _GetDeviceAttr
return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
File “tvm/_ffi/_cython/./packed_func.pxi”, line 332, in tvm._ffi._cy3.core.PackedFuncBase.call
File “tvm/_ffi/_cython/./packed_func.pxi”, line 263, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 252, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 182, in tvm._ffi._cy3.core.CHECK_CALL
File “/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py”, line 481, in raise_last_ffi_error
raise py_err
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (5) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(TVMFuncCall+0x68) [0xfffec70dd798]
[bt] (4) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x303d34c) [0xfffec70dd34c]
[bt] (3) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::CUDADeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue
)+0xd28) [0xfffec72075c8]
[bt] (2) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(+0x316452c) [0xfffec720452c]
[bt] (1) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x68) [0xfffec52c6508]
[bt] (0) /usr/local/lib/python3.10/dist-packages/tvm/libtvm.so(tvm::runtime::Backtraceabi:cxx11+0x30) [0xfffec7125380]
File “/opt/mlc-llm/3rdparty/tvm/src/runtime/cuda/cuda_device_api.cc”, line 73
CUDAError: cuDeviceGetName(&name[0], name.size(), dev.device_id) failed with error: CUDA_ERROR_NOT_INITIALIZED

Hi @doup0018 , I would recommend running this on JetPack 6 instead, or see this post for workaround:

jetson@jetson:~/jetson-containers$ sudo ./run.sh “$(./autotag local_llm)”

python3 -m local_llm.agents.video_query --api=mlc

–model NousResearch/Obsidian-3B-V0.5

–max-content-length 768

–max-new-tokens 32

–video-input rtsp://admin:?transmode=unicast&profile=vam

–video-output rtsp://localhost:1234/output --prompt

[3] 39679

bash: --video-output: command not found

Hope you could help me with this?

@yash.tamil.5 either you are missing the \ line-continuation characters in your multi-line bash command, or try surrounding your input stream URL in single quotes

This has been updated with web UI, new model support, and vector database:

1 Like

I just tried out the live llava demo using the command prompt below and while the terminal is writing out what the camera is seeing, but I am not getting the webpage to load the video stream… URL is https://127.0.0.1:8050.

I have tried both running on the device, as well as remotely via the hostname. I’m using Chromium on the Orin, and Edge on my Windows device.

Do I need to click any button to get the stream going in the browser?

Using JP6 DP, Orin AGX.

./run.sh $(./autotag local_llm)
python3 -m local_llm.agents.video_query --api=mlc
–model Efficient-Large-Model/VILA-2.7b
–max-context-len 768
–max-new-tokens 32
–video-input /dev/video0
–video-output webrtc://@:8554/output

Terminal is writing out this:

A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:34 | INFO | refresh rate: 1.34 FPS (747.9 ms)

A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:34 | INFO | refresh rate: 1.34 FPS (745.7 ms)

A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:35 | INFO | refresh rate: 1.26 FPS (794.9 ms)
A man in a blue and white space suit is sitting on the floor in front of a laptop.
03:11:36 | INFO | refresh rate: 1.35 FPS (743.2 ms)

Hi @jasonthenderson, can you try setting your chrome://flags#enable-webrtc-hide-local-ips-with-mdns to Disabled? If that doesn’t work, please look in the Jetson’s console right after your browser connects, and inspect your browser’s debug console (Ctrl+Shift+I)

I was not able to fix the issue, I moved on to run other VLM (SAM, LLAVA-13b-5GB) models in my Orin Developer Kit. I am facing torch and torch vision dependency issues. I can find and install torch that is compatible with cuda, but when I try to install torchvision, torch automatically removes torch-cuda and install torch cpu. Can you help me with this. I am trying to run this in VS code in my AGX Orin and I don’t want to run a docker cointainer.

Running-
Jetpack 6.0
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Thank you.

@yash.tamil.5 without using the containers, you’ll need to build torchvision from source like shown under the Installation section of this post:

The issues you mention of other packages uninstalling the desired torch version/ect are some of the reasons why I use the containers for the package environment. You can use the containers for development by just mounting in a shared directory that contains all your code. Then you have your code editable from outside the container, while inside the container all the GPU-accelerated packages you want are in there working & tested.

1 Like

The pip wheel works only for installing torch, installing torchvision will downgrade or change torch to CPU version, but I will try mounting my directory to the docker container and run the code.

Thank you very much for the help.

@yash.tamil.5 don’t pip install torchvision, see the instructions for building it from source in that thread I linked to above under the Installation section

Changing the Chrome setting to ‘disabled’ fixed the issue - thank you @dusty_nv !

@dusty_nv Hi Dustin, thank you so much for sharing this content!
I am relatively new to this field and want to ask:

  1. Is it possible to run the Live Llava demo on Windows PC(with Nvidia GPU) or is it exclusive to Orin?
  2. Is there any resource I can follow to try to do it on a PC?

Thanks again!

I am able to run the model now, but when i tried to open https://192.168.1.233:8050/ with chromium ( chrome://flags#enable-webrtc-hide-local-ips-with-mdns to Disabled) , the container would stop working and i would get the following error:

10:11:40 | INFO | refresh rate: 0.57 FPS (1744.0 ms)
2 mon[webrtc] websocket /output – new connection opened by 192.168.1.233 (peer_id=0)
[webrtc] new WebRTC peer connecting (192.168.1.233, peer_id=0)
**
ERROR:/opt/jetson-utils/codec/gstEncoder.cpp:876:static void gstEncoder::onWebsocketMessage(WebRTCPeer*, const char*, size_t, void*): ‘sinkpad’ should not be nullptr
Bail out! ERROR:/opt/jetson-utils/codec/gstEncoder.cpp:876:static void gstEncoder::onWebsocketMessage(WebRTCPeer*, const char*, size_t, void*): ‘sinkpad’ should not be nullptr
Fatal Python error: Aborted

Thread 0x0000fffddffff120 (most recent call first):
File “/usr/lib/python3.10/abc.py”, line 123 in subclasscheck
File “/usr/lib/python3.10/abc.py”, line 123 in subclasscheck
File “/usr/lib/python3.10/abc.py”, line 123 in subclasscheck
File “/usr/lib/python3.10/abc.py”, line 123 in subclasscheck
File “/usr/lib/python3.10/abc.py”, line 123 in subclasscheck
File “/usr/lib/python3.10/abc.py”, line 119 in instancecheck
File “/usr/lib/python3.10/_collections_abc.py”, line 997 in update
File “/usr/local/lib/python3.10/dist-packages/websockets/datastructures.py”, line 144 in update
File “/usr/local/lib/python3.10/dist-packages/websockets/datastructures.py”, line 75 in init
File “/usr/local/lib/python3.10/dist-packages/websockets/http11.py”, line 332 in parse_headers
File “/usr/local/lib/python3.10/dist-packages/websockets/http11.py”, line 149 in parse
File “/usr/local/lib/python3.10/dist-packages/websockets/server.py”, line 561 in parse
File “/usr/local/lib/python3.10/dist-packages/websockets/protocol.py”, line 260 in receive_data
File “/usr/local/lib/python3.10/dist-packages/websockets/sync/connection.py”, line 579 in recv_events
File “/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py”, line 196 in recv_events
File “/usr/lib/python3.10/threading.py”, line 953 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffff30aff120 (most recent call first):
File “/usr/lib/python3.10/threading.py”, line 324 in wait
File “/usr/lib/python3.10/threading.py”, line 607 in wait
File “/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py”, line 120 in handshake
File “/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py”, line 557 in conn_handler
File “/usr/lib/python3.10/threading.py”, line 953 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffff35b1f120 (most recent call first):
File “/usr/lib/python3.10/selectors.py”, line 416 in select
File “/usr/lib/python3.10/socketserver.py”, line 232 in serve_forever
File “/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py”, line 817 in serve_forever
File “/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py”, line 1123 in run_simple
File “/usr/local/lib/python3.10/dist-packages/flask/app.py”, line 625 in run
File “/opt/NanoLLM/nano_llm/web/server.py”, line 120 in
File “/usr/lib/python3.10/threading.py”, line 953 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffff3832f120 (most recent call first):
File “/usr/lib/python3.10/selectors.py”, line 469 in select
File “/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py”, line 260 in serve_forever
File “/opt/NanoLLM/nano_llm/web/server.py”, line 119 in
File “/usr/lib/python3.10/threading.py”, line 953 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffff4e3ef120 (most recent call first):
File “/opt/NanoLLM/nano_llm/plugins/video/video_source.py”, line 97 in capture
File “/opt/NanoLLM/nano_llm/plugins/video/video_source.py”, line 159 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffff4bbdf120 (most recent call first):
File “/opt/NanoLLM/nano_llm/agents/video_query.py”, line 313 in poll_keyboard
File “/usr/lib/python3.10/threading.py”, line 953 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffff493cf120 (most recent call first):
File “/usr/lib/python3.10/threading.py”, line 324 in wait
File “/usr/lib/python3.10/threading.py”, line 607 in wait
File “/opt/NanoLLM/nano_llm/plugin.py”, line 335 in process_inputs
File “/opt/NanoLLM/nano_llm/plugin.py”, line 321 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000fffe44fc1120 (most recent call first):
File “/usr/lib/python3.10/threading.py”, line 324 in wait
File “/usr/lib/python3.10/threading.py”, line 607 in wait
File “/opt/NanoLLM/nano_llm/plugin.py”, line 335 in process_inputs
File “/opt/NanoLLM/nano_llm/plugin.py”, line 321 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000fffe490cf120 (most recent call first):
File “/usr/lib/python3.10/threading.py”, line 320 in wait
File “/usr/lib/python3.10/threading.py”, line 607 in wait
File “/opt/NanoLLM/nano_llm/chat/stream.py”, line 79 in next
File “/opt/NanoLLM/nano_llm/plugins/chat_query.py”, line 223 in process
File “/opt/NanoLLM/nano_llm/plugins/chat_query.py”, line 152 in process
File “/opt/NanoLLM/nano_llm/plugin.py”, line 361 in dispatch
File “/opt/NanoLLM/nano_llm/plugin.py”, line 348 in process_inputs
File “/opt/NanoLLM/nano_llm/plugin.py”, line 321 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000fffe725ef120 (most recent call first):
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 509 in _generate
File “/opt/NanoLLM/nano_llm/models/mlc.py”, line 538 in _run
File “/usr/lib/python3.10/threading.py”, line 953 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000fffeecf9f120 (most recent call first):
File “/usr/lib/python3.10/threading.py”, line 324 in wait
File “/usr/lib/python3.10/threading.py”, line 607 in wait
File “/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py”, line 60 in run
File “/usr/lib/python3.10/threading.py”, line 1016 in _bootstrap_inner
File “/usr/lib/python3.10/threading.py”, line 973 in _bootstrap

Thread 0x0000ffffa65dd6c0 (most recent call first):
File “/usr/lib/python3.10/threading.py”, line 1116 in _wait_for_tstate_lock
File “/usr/lib/python3.10/threading.py”, line 1096 in join
File “/opt/NanoLLM/nano_llm/agent.py”, line 58 in run
File “/opt/NanoLLM/nano_llm/agents/video_query.py”, line 357 in
File “/usr/lib/python3.10/runpy.py”, line 86 in _run_code
File “/usr/lib/python3.10/runpy.py”, line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, zstandard.backend_c, charset_normalizer.md, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, PIL._imaging, PIL._imagingft, google.protobuf.pyext._message, jetson_utils_python, cuda._lib.utils, cuda._cuda.ccuda, cuda.ccuda, cuda.cuda, cuda._cuda.cnvrtc, cuda.cnvrtc, cuda.nvrtc, cuda._lib.ccudart.utils, cuda._lib.ccudart.ccudart, cuda.ccudart, cuda.cudart, _cffi_backend, pyaudio._portaudio, markupsafe._speedups, websockets.speedups, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, tvm._ffi._cy3.core (total: 145)

Hey there @vickhung,

I am getting the exact same error while running LLaVA inside the docker container. Were you able to solve it?

Thanks!

Hey Hardik13,

I haven’t been following on this issue for a while since i didnt see any updates or support provided. But heres what I know from the last that I looked into this matter:

There is an open case in github regarding this issue. and No possible solutions were provided.
I wont be resuming this issue for a bit.
I also tried out the gstreamer to get my video streaming on to rtsp, but i was faced with another demuxers error, so we moved on to another solution in the meantime.

Sorry for not being too helpful on this subject, but this is what I’ve seen so far.

Best of luck!