Running an SLM and Computer Vision model simultaneously

gaiusjulius · November 6, 2024, 9:08am

Hi, has anyone tried running both an SLM like, TinyLlama-1.1B and a YOLOv8 Object Detection on a Jetson Orin Nano 8GB? Do you think it may able to run this simultaneously?

dusty_nv · November 6, 2024, 4:50pm

Hi @gaiusjulius, presuming your models fit into memory, you can run them simultaneously by throttling the SLM/LLM token generation rate to sustain the desired performance.

Run YOLO on a separate CUDA stream, but if you encounter stuttering you may need to add sleep() calls to the inner LLM model inference loop, so it doesn’t consume 100% GPU generating tokens as fast as possible. Text-based chats with the user may be irregular workload, while vision remains rather constant.

In this video we had ran VLM, LLM, vectorDB, ASR, and TTS simultaneously on AGX Orin:

There are rate limiters in there for controlling how fast each stream/model runs that get dialed in. Also I had to pay attention to the CUDA streams (if your LLM is running in a different process than YOLO that shouldn’t be necessary as they are already in different CUDA contexts at the driver level)

gaiusjulius · November 7, 2024, 5:21am

Thank you for this information!

Topic		Replies	Views
Jetson orin nano deploying yolo with llm Jetson Orin Nano llm	3	122	February 9, 2026
Small LLMs and Mini VLMs on Orin Nano Jetson Projects generative_ai	0	1998	March 5, 2024
Issue with Running YOLOv8 on Jetson Orin Nano with Multiple Cameras Jetson Orin Nano camera , yolo	3	292	July 1, 2024
Anyway to boost yolo performance on Jetson Orin? Jetson Orin Nano yolo	16	1845	December 5, 2024
Running LLM in jetson agx orin Jetson AGX Orin llm	3	459	March 4, 2026
YOLOv5S model performance testing benchmark DeepStream SDK jetson , deepstream	2	245	April 11, 2025
AI Models That Run on Jetson Orin Nano Super (8GB) — A Practical Guide Jetson Orin Nano jetson-inference , jetson , generative_ai , llm , cosmos , nemotron , nemoclaw , openclaw	5	4851	May 2, 2026
YOLOv8 model training on Jetson Orin Nano Jetson Orin Nano yolo	6	1308	August 1, 2024
Jetson orin nano insanely slow inference speed? Jetson Orin Nano generative_ai	2	1584	May 6, 2024
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	26541	May 10, 2024

Running an SLM and Computer Vision model simultaneously

Related topics