NanoDB Vector created by the legacy and main branches vectors.bin is different

I know that legacy branch uses openaiclip’s github and main branch uses clip_trt’s github. I used main branch to make coco’s vector.bin and the search perfomance is very bad. I would like to ask where I didn’t notice it?

My used github GitHub - dusty-nv/NanoDB: Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP

Hi @chou880801, are you also using clip_trt on the other end to make the embedding that you are searching against? They should probably be from the same implementation. Also I believe that the original openAI CLIP made embeddings with dimension 768, while the other one with projection enabled is 1024, so should dig into that more and check it. Also do you have clip_trt using SigLIP or CLIP? I have found CLIP to have better img2img/txt2img search performance.

Hi, I checked my clip_trt is 768 dim not 1024 dim.

I’ve been using the same dataset to generate embeddings with both the Legacy and Main branches of NanoDB. However, I’ve noticed a significant difference in the results between Vectors2 and Vectors1. For example, when I use the keywords “cat,” “dog,” and “wave” to search, Vectors1 successfully retrieves similar images of cats, dogs, and waves. But Vectors2 seems to retrieve images randomly without any discernible pattern. (I use clip model is openai-clip-vit-large-patch14-336)

I suspect that the image feature extractor in the Main branch might be broken, but when I connect the Main branch NanoDB to Vectors1, it can still find the corresponding images using the keywords “cat,” “dog,” and “wave.”

I have confirmed that the input images produce the same vector values after preprocessing . However, the 768-dimensional feature vectors of the same input images show differences of about 1e-3 in each dimension when processed through the legacy branch and the main branch clip. The differences are very small.

Here are four images:

  • The top left and bottom left images show the results of searching for the keywords “cat” and “dog” using the Legacy branch to create COCO vectors and the Main branch NanoDB.
  • The top right and bottom right images show the results of searching for the keywords “cat” and “dog” using the Main branch to create COCO vectors and the Main branch NanoDB.

I’m sorry for any inconvenience, and if you have any insights into where things might have gone wrong, please let me know. Thank you for your time.

0830 latest update:

When I inserted embedding.cpu() between lines 112 and 113 in nanodb.py on the main branch, and also at line 68, the overall performance significantly improved. I would like to ask why this happens, because embedding.cpu() itself only copies the tensor to the CPU and doesn’t fundamentally change the embedding tensor.

Thanks for the update @chou880801, I think you discovered that additional CPU<->GPU synchronization may be required since I changed the vision encoders to using clip_trt (which uses asynchronous CUDA streams, and perhaps did not exhibit the same behavior with the previous OpenAI/HF implementations)

Appreciate the details from the debugging - will note this down to dig further into.

1 Like