Integrating NanoOWL with Deep stream

Hello,
I followed NVIDIA-AI-IOT/nanoowl: A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT. (github.com) and got NanoOWL working. Is there any possibilities to get NanoOWL integrated with Deep Stream. And if possible explain how can we integrate it get into production.
Thank you.

Regards,
C. Meenambika

1 Like

No. Currently there is no DeepStream sample for NanoOWL.

The sample is open source and based on CLIP GitHub - openai/CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image. Can you tell us what is your use case?

We mainly focus on video detection, we tried to run models faster by integrating through deep stream, we were successful running DINO. Therefore we planned to run NanoOWL inside deepstream. Also I fould that we can run custom models inside deepstream Using a Custom Model with DeepStream — DeepStream documentation 6.4 documentation. Can you tell by using the engine file “owl_image_encoder_patch32.engine” how can i run NanoOWL inside deepstream. For the model configs what will be value for “output-blob-names”

Thank you.
Regards,
C. Meenambika

NanoOWL is a multmodal model, it can not be integrate directly by custom the model only .

Why do you think you need DeepStream to implement the multimodal cases? What is your use case?

We are using deepstream as the framework for running our AI logic , example Dino model for person detection
we want to use NanoOWL as the model to run within the deepstream to get some extra insights about the things happens in the video or image.

There may be more customization for DeepStream to integrate multimodal models than just integrate the engine file. We are investigating the solutions.

If deepstream doesn’t have the multimodel support, then do you have any featureres what will be the best way to run multimodel like NanoOwl faster combining to run with other models like DINO which is slower comparatively

There is no chat API with DeepStream, that is what we need to develop. You may try to implement by yourself.

There is no simple way to implement it now.

Is there an option to set the text input in the custom-lib-path and implement it as a Custom Model? I am interested in using nanoOWL as a zero-shot model capable of detecting different types of objects on demand.

were you able to set it up?

Now you have it as part of Metropolis Microservices- with Deepstream + NanoOwl zero shot Bringing Generative AI to the Edge with NVIDIA Metropolis Microservices for Jetson | NVIDIA Technical Blog