Metadata of type NVDS_CROP_IMAGE_META in pure python?

Hi,

My current model is a custom model that detects persons, faces, furniture, and classify body actions. Nvinferserver tensor output gives me all the outputs fine.
From now on, I need to add a second model to classify emotions from faces crops.

At first I was intented to get face bboxes from tensors, image from gstbuffer and crop faces to add them to metadata so my second model work on faces only.
But, after an exhaustive search I haven’t found a way to operate on nvinferserver tensor output and send faces in metada using NVDS_CROP_IMAGE_META using python only.

I’m working on DS6.3.
Any help will be greatly appreciated.
Thanks in advance

I think the cropping is unnecessary. You can add a secondary gie and set its process_mode to PROCESS_MODE_CLIP_OBJECTS. Then set operate_on_class_ids to the value of the face class.

Just like:

pgie (detects faces) --> sgie(classify)

You can refer deepstream-test2, it is one pgie(detection) with three sgies(classification).

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinferserver.html

Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
• The pipeline being used

Morning @junshengy,
Let me send you my model configs for PGIE and SGIE where you’ll see that my first model’s output doesn’t adhere to deepstream’s DeepStream SDK API Reference.
Can you help me to figure out how the PGIE model output should be for the SGIE to be able to infer on faces only despite it will receive 14 other outputs from PGIE?
With that piece of information I can manage to get the pipeline up and running.
PGIE.config.pbtxt.txt (1.9 KB)
SGIE.config.pbtxt.txt (943 Bytes)

Thank you very much!

Oh! I forgot to say that after SGIE, I also need the 14 remaining outputs from the PGIE to be included in the tensors. At this point I’m assuming it will be available as metadata…

These configuration files are useless, please refer to /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2 and the link above first.

The following is a similar configuration

input_control {
  process_mode: PROCESS_MODE_CLIP_OBJECTS
  operate_on_gie_id: 1
  operate_on_class_ids: [0]
  secondary_reinfer_interval: 90
  async_mode: true
  object_control {
    bbox_filter {
      min_width: 64
      min_height: 64
    }
  }
}

Please also fill in the above information about the software and hardware

@junshengy My first model doesn’t have any class ids at all… that’s why I’m asking

My hardware is NVIDIA RTX3060.
Software DS6.3. Triton 23.02. Python 3.8.
Hardware Platform: GPU
• DeepStream Version: 6.3
• TensorRT Version: 8.5
• NVIDIA GPU Driver Version: 525
• Issue Type: questions
• The pipeline being used: nvmultiurisrcbin → nvinferserver → fakesink
A probe at nvinferserver src pad to parse output tensors

I can make my first model to output classIds, scores and nms , but I don’t figure how tensors should be for the SGIE to operate on faces smoothly given that my first model has outputs of type detector, classifier and keypoints.

1.If you only use pgie, can bbox display normally?

The pipeline just like

uridecodebin --> nvstreammux --> nvinfeserver(pgie) --> nvdsosd --> nveglglessink

2.What are the input and output of your sgie? image or just tensor ?

If it is images, set process_mode: PROCESS_MODE_CLIP_OBJECTS in the nvinferserver configuration file.

if input is tensor, refer to deepstream-ssd-parser, add post-processing to the src pad of pgie to convert the object to be classified into a tensor.

  1. The output of my PGIE is here: PGIE.config.pbtxt.txt I sent you before. Only tensors, no classIds.

  2. The input of my SGIE is here: SGIE.config.pbtxt.txt Input is a face image and output are emotions (classifier)

  3. deepstream-ssd-parser uses cpp function as postprocess. Is it feasible to do it in pure python?

You need to convert the output tensor by pgie to classids through post-processing. then add it to object meta.

deepstream-ssd-parser uses python for post-processing, please refer to pgie_src_pad_buffer_probe.

Hi @junshengy
1 what format/structure classids and the bboxes should have given my actual output tensors from PGIE: PGIE.config.pbtxt.txt ?
2 deepstream-ssd-parser uses python (my mistake)
3 Given classids + bboxes tensor + SGIE with PROCESS_MODE_CLIP_OBJECTS will my SGIE operate on face crops only? Where does the cropping takes place PGIE or SGIE?
4. Do I need to add metadata of type NVDS_CROP_IMAGE_META?

Thanks for your patience!

It depends on your model, I guess it should be face_bboxes.

sgie will clip based on the bbox in the object meta, and the clip happens inside nvinferserver.

nvinferserver is open sources. refer bool GstNvInferServerImpl::shouldInferObject at /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinferserver/gstnvinferserver_impl.cpp

Thank you very much @junshengy
Have a good day!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.