Originally published at: Visual Language Intelligence and Edge AI 2.0 | NVIDIA Technical Blog
VILA is a family of high-performance vision language models developed by NVIDIA Research and MIT. The largest model comes with ~40B parameters and the smallest model comes with ~3B parameters. It is fully open source (including model checkpoints and even training code and training data). In this post, we describe how VILA performs against other…
We observe very strong video understanding capability of VILA1.5 models. It is fully open sourced, feel free to try it out!