What else than “just marketing” is this? Given the fact, that NVVLLMVLM is not part of the default distribution, needs a separate project, requires 40+ GB GPU Memory (hello, in what world are you living?)
The open source DeepStream reference samples are also parts of the DeepStream SDK release.
The nvvllmvlm plugin helps users to integrate Cosmos-Reason2 model with vLLM into DeepStream pipeline. Even if you use other frameworks to deploy VLM locally, it will take lots of memory . Can you tell us what your real requirement is?
OK, so the official announcement (see link) gave the impression that there would be some kind of generic solution allowing native GStreamer plugins to integrate LLMs directly into a GStreamer pipeline.
In reality, though, there are currently just one or two massive Hugging Face-hosted models available, the NVIDIA plugin is NOT part of the standard deployment, but instead has to be installed separately as a PYTHON plugin (!), and the example application produces laughably bad hallucinations in anything resembling a real-world use case.
The requirement for 40 GB of GPU memory really shows how disconnected NVIDIA has become from practical reality. I installed this on an AWS L4 instance, fully aware that my 24 GB GPU would not be sufficient, and barely managed to get the 2B model variant running with a maximum of 16k tokens. But loading times of up to four minutes, effectively zero real-time behavior, and truly ridiculous results honestly left me speechless.
What exactly is the practical value here, beyond polished keynote demos?
I constantly face vague customer expectations like: “Can’t we just add some AI to it?” I tested the hype on metal now and found it completely inadequate — honestly just nonsense, far beyond anything normal people would consider affordable, usable, or practical.
Feel free to stone me. This is just my personal opinion.
Thank you for sharing your experience with nvvllmvlm plugin!