Object embeddings inference strategy


I succesfully used DeepStream for object detection and classification.
But I’d like to do inference with a custom model that gives object embeddings as output (could be a vector of 256 or 512 or 1024 elements).

What would be the best strategy with nvinfer as supports the following network types:

  • 0: Detector
  • 1: Classifier
  • 2: Segmentation
  • 3: Instance Segmentation
  1. Secondary classifier GIE?
    Should I set it to Classifier (it would be a secondary GIE), and then parse myself the output and keep the embeddings in the object metadata (obj_user_meta_list)?

  2. Reid with nvtracker?
    Another option, since DS 6.3 if I am correct, would be to do detection, then reid in nvtracker with my own model and get the output and do whatever I want with the embeddings, correct?

  3. If I do infer for embeddings as a secondary classifier GIE before nvtracker, is there a way to give nvtracker the embeddings of the objects for reid tracking? I would prefer this option so I can decide to use reid in nvtracker or NvDCF for example if I need more processing power and less accurate tracking.

Thanks for the help.

You can refer to our demo:sources\apps\sample_apps\deepstream-infer-tensor-meta-test to get the output tensor and process that yourself.

Thanks, I’ll try it.
I didn’t know network-type=100 existed:


## 0=Detector, 1=Classifier, 2=Segmentation, 100=Other

It’s not in the documentation of nvinfer.

Also, if I succeed to generate embeddings with nvinfer, do you know if I can pass it in any way to the nvtracker plugin for reid?

Can you describe the input and output of this model in detail, as well as your specific use case?

Hi @yuweiw,

I want to run 3 models, one is face detection on the whole frame, then landmarks extraction of each face to align the faces (I don’t know yet where to store the image in the DeepStream structures, I guess in obj_user_meta_list, but these images should be the input of the next model, if you have some advices for this step), and on these modified crops I want to extract embeddings with a model with input 3x112x112 and output of 512 vector of floats.
I can do it easily with TensorRT API, but with DeepStream it seems less obvious where to store temporary images for the next inference. And even less obvious the nvtracker part if I want to use these embeddings as reid, but this part is less important for now.

Could you refer to the deepstream-faciallandmark-app to see if that meets your needs?
And our tracker already supports REID. You can refer to samples\configs\deepstream-app\config_tracker_NvDCF_accuracy.yml to learn how to set the parameters.

Thanks for the help, I was already investigating the deepstream-faciallandmark-app, I’ll let you know if I have any problem during implementation.

I think you should read this repo. They add landmarks to user_meta with a custom structure and then they use nvdsvideotemplate to infer landmarks to gaze. And with your description, you can use nvdsvideotemplate to infer face to vector embedding

Thanks all, I finally created my own gst plugin and also nvdspreprocess, creating my custom user meta to store landmarks and then embeddings, everything is working fine now.