I succesfully used DeepStream for object detection and classification.
But I’d like to do inference with a custom model that gives object embeddings as output (could be a vector of 256 or 512 or 1024 elements).
What would be the best strategy with nvinfer as supports the following network types:
0: Detector
1: Classifier
2: Segmentation
3: Instance Segmentation
Secondary classifier GIE?
Should I set it to Classifier (it would be a secondary GIE), and then parse myself the output and keep the embeddings in the object metadata (obj_user_meta_list)?
Reid with nvtracker?
Another option, since DS 6.3 if I am correct, would be to do detection, then reid in nvtracker with my own model and get the output and do whatever I want with the embeddings, correct?
If I do infer for embeddings as a secondary classifier GIE before nvtracker, is there a way to give nvtracker the embeddings of the objects for reid tracking? I would prefer this option so I can decide to use reid in nvtracker or NvDCF for example if I need more processing power and less accurate tracking.
I want to run 3 models, one is face detection on the whole frame, then landmarks extraction of each face to align the faces (I don’t know yet where to store the image in the DeepStream structures, I guess in obj_user_meta_list, but these images should be the input of the next model, if you have some advices for this step), and on these modified crops I want to extract embeddings with a model with input 3x112x112 and output of 512 vector of floats.
I can do it easily with TensorRT API, but with DeepStream it seems less obvious where to store temporary images for the next inference. And even less obvious the nvtracker part if I want to use these embeddings as reid, but this part is less important for now.
Could you refer to the deepstream-faciallandmark-app to see if that meets your needs?
And our tracker already supports REID. You can refer to samples\configs\deepstream-app\config_tracker_NvDCF_accuracy.yml to learn how to set the parameters.
I think you should read this repo. They add landmarks to user_meta with a custom structure and then they use nvdsvideotemplate to infer landmarks to gaze. And with your description, you can use nvdsvideotemplate to infer face to vector embedding
Thanks all, I finally created my own gst plugin and also nvdspreprocess, creating my custom user meta to store landmarks and then embeddings, everything is working fine now.