Continuing the discussion from Live LLaVA webUI don't show NanoDB webUI :
Ok gotcha @masaki_yamagishi , I realize what is going on now: VILA-2.7B used the same openai/clip-vit-large-patch14-336
vision model that the NanoDB was created with, however VILA1.5-3B uses a SigLIP vision encoder that it custom-trained, and the embedding dimensions are different.
I will have to do some rework of NanoDB to support using arbitrary embedding models, and the database will need to be re-indexed with the particular model the VLM is using (should you want to reuse the embeddings and not have to recalculate them)
In the nearer term, I will have to add a flag to the VideoQuery agent to disable reusing the embeddings, and then NanoDB will go back to calculating them with the original CLIP model. Until then, unfortunately I would go back to using VILA-2.7B if you require the live NanoDB integration, sorry about that.
Hi, @dusty_nv . Can you help me with this?
I want to re-index my Coco data with SigLIP vision encoder because I need the embedding size compatible with VILA1.5-3B . Another reason is Live LLaVA does not work fine with dustynv/nano_llm 24.7-r36.2.0
image but dustynv/nano_llm 24.5-r36.2.0
is quite good.
I usually get outputs like this with 24.7
→
Another reason is Live LLaVA does not work fine with dustynv/nano_llm 24.7-r36.2.0
image but dustynv/nano_llm 24.5-r36.2.0
is quite good.
This is an another problem but for now I just want to know if there is a way to reindex my dataset with another visual encoder.