How to pass landmarks of RetinaFace model to downstream?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU / T4
• DeepStream Version
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)

I’m trying running RetinaFace model with DeepStream 6.0.1, but I found it quite hard to pass landmarks to downstream with NvInfer plugin. I’m using docker, so the environment is not a problem.

RetinaFace is indeed an object detection model. If we only want boundingboxs, writing an custom-bbox-parser-func will be enough, and NvInfer will do NMS-like postprocessing as configured for us.

But RetinaFace has extra landmarks for every detected object/face, and the output struct NvDsInferObjectDetectionInfo in custom-bbox-parser-func has no field to store extra landmark data.

I’ve found several other topics about this problem, but i didn’t see an actual solution.

How to pass the 5 landmarks of retinaface and perform face alignment between pgie and sgie?

Face detection with deepstream with landmarks

I’ve come up two possible ways:

  1. Directly writing model output to Metadata, and do post processing and parsing in a probe function. This will introduce a little overhead, since raw model output occupy more memory.

  2. Modify NvDsInferObjectDetectionInfo struct to support landmarks, modify related metadata writing code and recompile NvInfer plugin, since source code the of plugin is provided. But I don’t know if the source code is equivalent to the current binary plugin, and will there by a legal problem in commercial usage?

Please give some advice, thanks!

I’ve come up with another idea.

Configure RetinaFace as an Instance Segmentation model-type, write a custom instance segmentation parser func. In that function, do NMS and save landmarks in mask field of NvDsInferInstanceMaskInfo. Then we shall get landmarks data in NvOSD_MaskParams of NvDsObjectMeta.

This is a misuse of nvinfer plugin, and I’m not sure if there would be a problem in downstream OSD display. But it has a high probability to work. I will report later.

If you used gst-python, the scalability of metadata for deepstream are extremely limited. We even encoded our custom data into NvDisplayText to downstream… It is strongly suggested that Nvidia improve the supports for this custom-data area.