FacialLandmarks python multiple faces issue

Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
Jetson Orin 4012, NVIDIA Jetson Orin NX Bundle, 8x 2GHz, 16GB DDR5
• DeepStream Version
Container: deepstream:7.0-triton-multiarch
• JetPack Version (valid for Jetson only)
see Container: deepstream:7.0-triton-multiarch
• TensorRT Version
see Container: deepstream:7.0-triton-multiarch
• NVIDIA GPU Driver Version (valid for GPU only)
$ nvidia-smi
Returns: Driver Version: N/A
• Issue Type( questions, new requirements, bugs)
Issue

I am running the following gstreamer pipeline in python via the provided bindings:

gst-launch-1.0 v4l2src device=/dev/video0 !  nvvideoconvert src-crop=0:0:1920:1080 ! m.sink_0 nvstreammux name=m batch-size=1 live-source=1 width=1280 height=720 ! nvinfer config-file-path=configs/facedetect.yml ! nvinfer config-file-path=ai_pipeline/configs/landmarks.yml ! fakesink

I added the following landmark probe to the landmarks infer stage:

def _landmarks_inference_probe(self, pad: Gst.Pad, info: Gst.PadProbeInfo) -> None:
    buffer = info.get_buffer()
    if not buffer:
      return Gst.PadProbeReturn.OK
    
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
      try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
      except StopIteration:
        break

      l_object = frame_meta.obj_meta_list
      faces: List[Face] = []
      id: int = 0
      obj_counter = 0
      while l_object is not None: # Loop through found faces
        try:
          obj_meta = pyds.NvDsObjectMeta.cast(l_object.data)
        except StopIteration:
          break
        
        log.info(f'obj: {obj_counter}')
        obj_counter += 1
        left: float = obj_meta.detector_bbox_info.org_bbox_coords.left
        top: float = obj_meta.detector_bbox_info.org_bbox_coords.top
        width: float = obj_meta.detector_bbox_info.org_bbox_coords.width
        height: float = obj_meta.detector_bbox_info.org_bbox_coords.height

        landmarks: List[Point] = []
        l_user = obj_meta.obj_user_meta_list
        user_counter = 0
        while l_user is not None:
          try:
            user_meta = pyds.NvDsUserMeta.cast(l_user.data)
          except StopIteration:
            break

          if (user_meta.base_meta.meta_type != pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META):
            continue
          log.info(f'usr: {user_counter}')
          user_counter += 1
          tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
          frame_outputs = []
          output_shapes = [[80,80,80],[80,2],[80]]
          for i in range(tensor_meta.num_output_layers):
            layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
            # Convert NvDsInferLayerInfo buffer to numpy array
            ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
            v = np.ctypeslib.as_array(ptr, shape=(output_shapes[i]))
            frame_outputs.append(v)

          landmarks: List[Point] = Landmarks.from_pipeline_inference_output(frame_outputs)
      
          try:
            l_user = l_user.next
          except StopIteration:
            break
        
        face = Face(
          id,
          left,
          top,
          width,
          height,
          landmarks,
          frame_outputs[2], #landmark_confidences
        )
        faces.append(face)
        id += 1
        log.info(f'{face}')
        try:
          l_object = l_object.next
        except StopIteration:
          break
        
      log.info(f'{faces}')
        
      try:
        l_frame = l_frame.next
      except StopIteration:
        break

    return Gst.PadProbeReturn.OK

I thought everythin was working fine first, but when teesting with multiple found faces within the frame, i noticed that the landmarks of the last face found (face 2 with 2 found faces, face 3 with 3 found faces etc…) are scattered all over the place. Am i doing something wrong while retrieving the landmarks for each found face?

We know nothing about your models and implementation, you may need to debug by yourself.

At least the pipeline is fine as your description.

I am using the models and their configs as provided by NVIDIA:

Currently i am extracting all the inferred data in the probe added to the source-pad of the nvinfer element created by:

self.landmarks_infer = Gst.ElementFactory.make('nvinfer', 'nvinfer_landmarks')
self.landmarks_infer.set_property('config-file-path', 'ai_pipeline/configs/landmarks.yml')

Probe gets added by:

inference_src_pad: Optional[Gst.Pad] = self.landmarks_infer.get_static_pad('src')
if not inference_src_pad:
  raise RuntimeError(f'[{os.getpid()} {GazenetPipeline.__name__}] Failed to get src pad from landmark_infer')
inference_src_pad.add_probe(Gst.PadProbeType.BUFFER, self._landmarks_inference_probe)

Code is inspired by discussion: : GazeNet - Python Implimentation

After closer inspection of the sample (deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at release/tao5.3_ds7.0ga · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub) i see, that the sample is using 2 probes, 1 one the source-pad of the pgies next queue element and 1 of the source pad of the sgies next queue element.
My question now is: Do i have to mirror that setup exactly in my python code and if so: Why does it kinda work the way i am doing it right now with just a single probe at the landmark nvinfers src-pad, but only with a single face found?

It feels like i am looping through the frame meta in a slightly wrong way, but am not able to see whats wrong. Appreciate your help!

Since deepstream_tao_apps/apps/tao_others/deepstream-faciallandmark-app/deepstream_faciallandmark_app.cpp at release/tao5.3_ds7.0ga · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub works, please implement your app as the sample.

Setting the batch-size parameter to 1 seemed to fix the issue. Can’t really explain why^^

landmarks_config.yml:

#Source: https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tao5.3_ds7.0ga/configs/nvinfer/facial_tao/faciallandmark_sgie_config.yml
property:
  gpu-id: 0
  model-engine-file: ../models/landmarks/model.etlt_b32_gpu0_int8.engine
  tlt-model-key: nvidia_tlt
  tlt-encoded-model: ../models/landmarks/model.etlt
  int8-calib-file: ../models/landmarks/int8_calibration.txt
  #dynamic batch size
  batch-size: 1
  ## 0=FP32, 1=INT8, 2=FP16 mode
  network-mode: 1
  num-detected-classes: 1
  output-blob-names: 'softargmax;softargmax:1;conv_keypoints_m80'
  #0=Detection 1=Classifier 2=Segmentation 100=other
  network-type: 100
  # Enable tensor metadata output
  output-tensor-meta: 1
  #1-Primary  2-Secondary
  process-mode: 2
  gie-unique-id: 2
  operate-on-gie-id: 1
  net-scale-factor: 1.0
  offsets: '0.0'
  input-object-min-width: 5
  input-object-min-height: 5
  #0=RGB 1=BGR 2=GRAY
  model-color-format: 2

class-attrs-all:
  pre-cluster-threshold: 0.0

Python code to extract informations:

def _add_landmarks_inference_probe(self) -> None:
  inference_src_pad: Optional[Gst.Pad] = self.landmarks_infer.get_static_pad('src')
  if not inference_src_pad:
    raise RuntimeError(f' Failed to get src pad from landmark_infer')
  inference_src_pad.add_probe(Gst.PadProbeType.BUFFER, self._landmarks_inference_probe)

def _landmarks_inference_probe(self, pad: Gst.Pad, info: Gst.PadProbeInfo) -> None:
  buffer = info.get_buffer()
  if not buffer:
    return Gst.PadProbeReturn.OK
  
  batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(buffer))
  l_frame = batch_meta.frame_meta_list
  while l_frame is not None:
    try:
      frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    except StopIteration:
      break

    l_object = frame_meta.obj_meta_list
    faces: List[Face] = []
    while l_object is not None: # Loop through found faces
      try:
        obj_meta = pyds.NvDsObjectMeta.cast(l_object.data)
      except StopIteration:
        break
      # Extract bounding box information for found faces
      left: float = obj_meta.detector_bbox_info.org_bbox_coords.left
      top: float = obj_meta.detector_bbox_info.org_bbox_coords.top
      width: float = obj_meta.detector_bbox_info.org_bbox_coords.width
      height: float = obj_meta.detector_bbox_info.org_bbox_coords.height

      landmarks: List[Point] = []
      l_user = obj_meta.obj_user_meta_list
      while l_user is not None: # Loop through user meta (landmark meta in this case)?
        try:
          user_meta = pyds.NvDsUserMeta.cast(l_user.data)
        except StopIteration:
          break

        if (user_meta.base_meta.meta_type != pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META):
          continue

        tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
        frame_outputs = []
        output_shapes = [[80,80,80],[80,2],[80]]
        for i in range(tensor_meta.num_output_layers):
          layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
          # Convert NvDsInferLayerInfo buffer to numpy array
          ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
          v = np.ctypeslib.as_array(ptr, shape=(output_shapes[i]))
          frame_outputs.append(v)

        landmarks: List[Point] = [Point(x, y) for x, y in frame_outputs[1]]
        landmark_confidences: List[float] = frame_outputs[2]
    
        face = Face(
          left,
          top,
          width,
          height,
          landmarks,
          landmark_confidences,
        )
        faces.append(face)

        try:
          l_user = l_user.next
        except StopIteration:
          break
      
      try:
        l_object = l_object.next
      except StopIteration:
        break
      
      # List of faces + their landmark data found in frame
      print(faces) 
    try:
      l_frame = l_frame.next
    except StopIteration:
      break

  return Gst.PadProbeReturn.OK