Live Llava on Orin

New demo of Jetson Orin running LLaVA vision-language models on live video streams! This multimodal pipeline has been optimized with 4-bit quantization and tuned CUDA kernels to achieve interactive latency onboard edge devices. Try it yourself with the tutorial on Jetson AI Lab!

Next up will be extracting constrained JSON output from Llava and using it to trigger user-promptable alerts/actions for always-on applications.

YouTube: https://www.youtube.com/watch?v=X-OXxPiUTuU
Jetson AI Lab: Live LLaVA 🆕 - NVIDIA Jetson Generative AI Lab
Jetson Containers: jetson-containers/packages/llm/local_llm at master · dusty-nv/jetson-containers · GitHub

2 Likes

Rather cool stuff. Now…attach voice synthesis and you have great software for the visually impaired.

2 Likes

Cool ! do you have any plans to integrate this into mmj?

Thanks @blanc9, yes!, we are currently working to integrate this optimized VLM pipeline into Metropolis Microservices - stay tuned.