I couldn’t able to find many papers which suits my requirement. Here are few resources that I have referred in this work.
From my experience, Ideally the end to end pipeline for such project involves multiple models. For your reference here I am sharing the pipeline as well. Addressing this pipeline end to end will helps many developers to implement the products quickly.
Object Detector → Feature Points for Object Alignment → Embeddings from Siamese Network → Similarity Matching.