DeepStream input preprocessing

peeranat85 · July 15, 2022, 9:42am

Given an image of size 600x800 (height x width)

The following configs
Streammux

enable-padding=1

PGIE

symmetric-padding=1
maintain-aspect-ratio=1
scaling-filter=0

and network input shape 608 x 1088 (height x width) will result in the following image (ignore color channel for now)

This basically resizes and pads the input image and put it in the middle of a black image of size 608 x 1088. From my understanding, this DeepStream config is similar to the following Python code?

def letterbox(img, height=608, width=1088, color=(0, 0, 0)):  # resize a rectangular image to a padded rectangular 
    shape = img.shape[:2]  # shape = [height, width]
    ratio = min(float(height)/shape[0], float(width)/shape[1])
    new_shape = (round(shape[1] * ratio), round(shape[0] * ratio)) # new_shape = [width, height]
    dw = (width - new_shape[0]) / 2  # width padding
    dh = (height - new_shape[1]) / 2  # height padding
    top, bottom = round(dh - 0.1), round(dh + 0.1)
    left, right = round(dw - 0.1), round(dw + 0.1)
    img = cv2.resize(img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # padded rectangular
    return img, ratio, dw, dh

Here’s my question

Does the Python implementation above match the DeepStream preprocessing? I need to know this because I’d have to rescale the predictions back to the original image input of size 600 x800 with

      float net_width = 1088.f, net_height = 608.f;
      float img_width = 800.f, img_height = 600.f;
      float gain = min(net_width / img_width, net_height / img_height);
      float pad_x = (net_width - img_width * gain) / 2;
      float pad_y = (net_height - img_height * gain) / 2;
      float x1 = (rect.x - pad_x) / gain;
      float y1 = (rect.y - pad_y) / gain;
      float x2 = (rect.x + rect.width - pad_x) / gain;
      float y2 = (rect.y + rect.height - pad_y) / gain;
      float width = x2 - x1;
      float height = y2 - y1;

If it’s not the same, can you point to the example of rescaling the predictions back to its original scale.

In DeepStream, with currently supported Streammux and PGIE configs, is it possible to pad and not put the resized image in the middle but the top left as below?

image1219×1271 227 KB

Environment
Architecture: x86_64
GPU: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
NVIDIA GPU Driver: Driver Version: 495.29.05
DeepStream Version: 6.0 (running on docker image nvcr.io/nvidia/deepstream:6.0-devel)
TensorRT Version: v8001
Issue Type: Question

Thanks
Peeranat F.

yuweiw · July 18, 2022, 3:17am

Hi @peeranat85 , when you set enable-padding, it will preserve the input aspect ratio while scaling by padding with black bands. So when you want to rescale back, you can try to use the gstnvvideoconver plugin, but you should calculate the crop coordinates.

peeranat85 · July 18, 2022, 7:24am

So when you want to rescale back, you can try to use the gstnvvideoconver plugin, but you should calculate the crop coordinates.

Can you point to any example? I need to know how it is rescaled.

peeranat85 · July 19, 2022, 3:37pm

Hi @yuweiw
May I ask for an update on this?

yuweiw · July 20, 2022, 5:19am

Hi @peeranat85 , cause you didn’t scale the origin picture. You just add the black bands around the picture.So, I think you can calculate the coordinates of each point from the 608x1088 picture. Then you can use the gstnvvideoconver plugin and set the crop parameters. Could you try this way? Thanks
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvvideoconvert.html

peeranat85 · July 20, 2022, 7:37am

@yuweiw Please, see my response below.

So, I think you can calculate the coordinates of each point from the 608x1088 picture.

How can I calculate the coordinates from 608x1088 image back to its original 600x800 image? I couldn’t find any example that shows how. Given the Streammux config

[streammux]
gpu-id=0
batch-size=1
# Muxer batch formation timeout, for e.g. 40 millisec. Should ideally be set
# based on the fastest source's framerate.
batched-push-timeout=40000
# Set muxer output width and height
width=1088
height=608
# enable-padding=1, to preserve the input aspect ratio while scaling by padding with black bands.
enable-padding=1

the image is padded to the center. From my understanding, it matches the letterbox function I wrote earlier?
If my understanding is right, then I can use the logic shown in my Question 1 in the main thread.

Btw, I wanted the way to rescale the coordinates output by DeepStream back to its original size (600x400), not by manually setting the crop properties as in the gstnvvideoconvert plugin. This is because the rescaled coordinates will be saved into a file and it will be compared with plain Pytorch code for in-depth performance evaluation. Thanks

yuweiw · July 20, 2022, 10:55am

Hi @peeranat85 , yes, it’s similar to your letter_box code. Cause we have added the black bands to picture. It’s hard to rescale back if you don’t want to crop. Cause we have not integrated the black edge detection and crop algorithm. So if you want to rescale back, you have to use the gstnvvideoconvert plugin to crop. We don’t have a simalar example yet.
Also, you can try to set theenable-padding=0, it maybe easy to rescale back by yourself.

system · August 3, 2022, 10:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.