Ok gotcha @masaki_yamagishi, I realize what is going on now: VILA-2.7B used the same openai/clip-vit-large-patch14-336
vision model that the NanoDB was created with, however VILA1.5-3B uses a SigLIP vision encoder that it custom-trained, and the embedding dimensions are different.
I will have to do some rework of NanoDB to support using arbitrary embedding models, and the database will need to be re-indexed with the particular model the VLM is using (should you want to reuse the embeddings and not have to recalculate them)
In the nearer term, I will have to add a flag to the VideoQuery agent to disable reusing the embeddings, and then NanoDB will go back to calculating them with the original CLIP model. Until then, unfortunately I would go back to using VILA-2.7B if you require the live NanoDB integration, sorry about that.