Fine tuning of Owl-ViT model

Hello all,

I am using mmj_genai application. It is using the google’s Owl-ViT model.

I want to use mmj_genai application to detect custom objects. So first I want to train or fine-tune the Owl-ViT model and then convert it to “.engine” file. Is there a way to fine-tune this?

When I follow the steps given on the official github, I am getting this error, ValueError: Did not find decoder for lvis:1.3.0. Please specify decoders for all datasets in DECODERS.

Also, in which format does this model needs the input data? Will it only work with tfrecords? Or can it work with other type of separate input as well? (Custom data, like, images and bounding boxes data. Without converting them to tfrecord files.)