Double staged inference with interpendency

• Hardware Platform Jetson Orin AGX
• DeepStream Version 7.1
• JetPack Version 6.0
• TensorRT Version 8.6.2.3

Hello there,

My project goal is to perform an inference in two steps:

  • The first one (on a lower resolution - rescaled form the full res) will identify the object
  • The second one (cropped to the region of interest of the full resolution, based on the first inference) will perform the final inference to get some details on the object

I’m having some hard time to structure this on a Deepstream pipeline and wanted some tips on the best path to choose. My first idea was to structure it more or less as shown below:

appsrc.link(queue1)
queue1.link(nvvidconv1)
nvvidconv1.link(tee)

# PGIE
tee_srcpad_pgie = tee.get_request_pad('src_%u')
queue_pgie_sinkpad = queue_pgie.get_static_pad("sink")
tee_srcpad_pgie.link(queue_pgie_sinkpad)

queue_pgie.link(streammux1)
streammux1.link(pgie)
pgie.link(fakesink1) # PGIE output sink

# SGIE
tee_srcpad_sgie = tee.get_request_pad('src_%u')
queue_sgie_sinkpad = queue_sgie.get_static_pad("sink")
tee_srcpad_sgie.link(queue_sgie_sinkpad)

queue_sgie.link(videocrop)
videocrop.link(sgie)
sgie.link(fakesink2) # SGIE output sink

That is, the image from nvvidconv1 is the original full res one (for the pgie it will be downsized via streammux1). The tricky part for me is that I depend on info coming from the pgie step to perform the crop that will be sent to sgie.

Does this path make sense? Hope the idea is clear, any comments or further examples are appreciated,
Thanks in advance.

Your case is a PGIE+SGIE case. Please refer to deepstream_python_apps/apps/deepstream-test2 at master · NVIDIA-AI-IOT/deepstream_python_apps sample. The trafficcamnet model(PGIE model) works on the full frames to identify the cars(objects) bboxes, the vehicletypenet model(SGIE model) works on the cars(objects) to recognize the car types.

This happens inside nvinfer when you configure the model as PGIE

This happens inside nvinfer whe you configure the model as SGIE.

Please refer to Gst-nvinfer — DeepStream documentation

Hi @Fiona.Chen, thanks for the quick reply!

I’ll try implementing and circle back here.

Just to be sure before I move one: on the second inference step, i don’t want to simply perform a new class identification on the bouding box from the pgie (like in the vehicle type example), I want to perform further objects inference only for this cropped region with full resolution, as if I was to further identify the wheels, the windowshield, the mirror, etc…

The suggested architecture should also work, right? Thanks!

The SGIE means inferencing on objects but not frames. The output can be anything since the postprocessing can be customized.