Integrating VLM (Vision-Language Models) within DeepStream

**• Hardware Platform (GPU) RTX 3060
**• DeepStream Version 7.0
• TensorRT Version
**• NVIDIA GPU Driver Version (valid for GPU only) CUDA12.1
• Issue Type (questions)

Is there an official plan to integrate VLM models within the DeepStream framework?

The VIA microservice is released for integrating VLM models. Please refer to Visual Insight Agent (VIA) Microservices Preview | NVIDIA Developer. And the VIA forum is Latest Intelligent Video Analytics/Visual AI Agent topics - NVIDIA Developer Forums

Can this framework be integrated with DeepStream for execution?

There is already DeepStream integrated inside VIA.

DeepStream don’t have chat capability. It is not suitable to run VLM only with DeepStream.

So, does VIA inherently utilize some of DeepStream’s features, such as the ability to handle multiple streams simultaneously?

Would there be commonalities in the underlying logic, such as RTSP configuration or switching detection models?

So, for now, there won’t be a new version of DeepStream that allows replacing detection models (e.g., Yolo) with VLM, but instead, the handling of VLM models will be managed within the VIA framework?

Yes.

VIA support live streams such as RTSP,…

Yes.

Please register in Visual Insight Agent (VIA) Microservices Preview | NVIDIA Developer and get the documents for VIA.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.