Inference from very wide source

gschimmel · January 13, 2021, 2:06pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson AGX Xavier
• DeepStream Version
4.0
• JetPack Version (valid for Jetson only)
4.4
• TensorRT Version
7.1.3.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)

I’m using deepstream-app, deepstream-test4-app and different derivatives from those.

When I process a “normal” h264 file, 16:9 ratio, 1920x1080, it works great.

This is the mediainfo output for such file:

General
Complete name                            : una.h264
Format                                   : AVC
Format/Info                              : Advanced Video Codec
File size                                : 38.6 MiB

Video
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Baseline@L4
Format settings                          : 1 Ref Frames
Format settings, CABAC                   : No
Format settings, ReFrames                : 1 frame
Format settings, GOP                     : M=1, N=30
Width                                    : 1 920 pixels
Height                                   : 1 080 pixels
Display aspect ratio                     : 16:9
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive

But what I have to process is a wider image, which is a stitch from 3 cameras, resulting in a 3840x720 image.

This is the mediainfo output of such file:

General
Complete name                            : ../filename.h264
Format                                   : AVC
Format/Info                              : Advanced Video Codec
File size                                : 103 MiB

Video
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Baseline@L5
Format settings                          : 1 Ref Frames
Format settings, CABAC                   : No
Format settings, ReFrames                : 1 frame
Format settings, GOP                     : M=1, N=30
Width                                    : 3 840 pixels
Height                                   : 720 pixels
Display aspect ratio                     : 5.333
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive

Both files are similar in quality, resolution, etc. And composed of the same features.

But in the second case the inference is really poor. It only detects a couple of the bigger persons and nothing more. There are about 10-15 persons on the image, which are perfectly detected on the single image test.

So I guess my question here is: What do I have to tune in the inference config files in order to detect ok on the wider image? That am I doing wrong?

Thanks a lot.

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

bcao · January 14, 2021, 1:14am

Hey, why you still need to use DS4.0, could you upgrade to latest DS5.0.1.
For your question, is it possible to input 3 single streams from camera to DS pipeline instead of stitch them to one stream, I guess the resize in nvstreammux or nvinfer will affact the accuracy.

gschimmel · January 14, 2021, 12:08pm

I have to use Jetpack 4.4.0 for an issue with my drivers, and for historical reasons I kept using DS4.0. I tried to run deepstream-test4 from DS5.0 but it gave a lot of errors so I abandoned it.

Is it much better to run DS5.0? Does it improve the accuracy in my case?

I know, but my use case is to first stitch and then to detect objects in this wide video.
Is not possible to accurately detect objects in this 3840x720 canvas? Is not that big of an image. Is it about the aspect ratio?

Can you tell me more about this resize? Which would be the ideal input size? Does it has to be 16:9?

Thanks a lot

gschimmel · January 14, 2021, 1:57pm

I would like to add some information here.

I activated the file output (and screen output also), and both of them shows me a very distorted 16:9 output. I suppose then that the network is having a hard time to detect these distorted “thin” persons.

Is it possible to avoid this resize? Can I tell the pipeline that I’m trying to process a 3840x720 image?

thanks

m.fruhner · January 18, 2021, 12:38pm

Hey, for me the following options helped:

In pgie config:

maintain-aspect-ratio=1

and for streammux:

enable-padding=1

Do you have these enabled?

gschimmel · January 18, 2021, 12:45pm

No I haven’t previously, but this weekend I’ve been trying those, in special maintain-aspect-ratio, and it seems to be just what I needed.

Thanks a lot for your time and effort.

gschimmel · January 29, 2021, 4:48pm

This improved the things really a lot.

But It still wont perform nearly as well as a single 1920x1080 or 3840x2160 frame. The small objects won’t get recognized.

Hints, anyone?

Thanks a lot

bcao · February 2, 2021, 8:39am

We use nearest interpolation by default, for DS5.0 later version, we can use scaling-filter to specify which interpolation algorithm to use, I think this should be helpful in your case. For DS4.0 , you need to change the source code.

/**
 * Specifies video interpolation methods.
 */
typedef enum
{
  /** Specifies Nearest Interpolation Method interpolation. */
  NvBufSurfTransformInter_Nearest = 0,
  /** Specifies Bilinear Interpolation Method interpolation. */
  NvBufSurfTransformInter_Bilinear,
  /** Specifies GPU-Cubic, VIC-5 Tap interpolation. */
  NvBufSurfTransformInter_Algo1,
  /** Specifies GPU-Super, VIC-10 Tap interpolation. */
  NvBufSurfTransformInter_Algo2,
  /** Specifies GPU-Lanzos, VIC-Smart interpolation. */
  NvBufSurfTransformInter_Algo3,
  /** Specifies GPU-Ignored, VIC-Nicest interpolation. */
  NvBufSurfTransformInter_Algo4,
  /** Specifies GPU-Nearest, VIC-Nearest interpolation. */
  NvBufSurfTransformInter_Default
} NvBufSurfTransform_Inter;

Keep in mind, we don’t maintain DS4.x version, so it’s better to upgrade to latest DS version

gschimmel · February 2, 2021, 1:49pm

Yes I plan to migrate to DS5.0 as soon as posible. But I had some problems trying so I’m now waiting for a new jetson.

Thanks a lot.

Topic		Replies	Views
Processing 4K Images with DeepStream on Jetson DeepStream SDK jetson-inference , gstreamer , jetson , deepstream	17	138	August 12, 2024
Pixel distortion in multiple RTSP sources with different frame rate using New Streammux DeepStream SDK	5	380	March 21, 2024
Ingesting a large frame and breaking/cropping it into multiple frames DeepStream SDK	16	824	October 12, 2021
Can deepstream handle higher resolutions than 1080p? DeepStream SDK	17	2126	October 12, 2021
New nvstreammux and deepstream-parallel-inference-app DeepStream SDK	3	170	July 9, 2024
Can not run inference with bitmap image in Deepstream 6.0, TRT8.2 and JP4.6.3? DeepStream SDK	21	340	October 14, 2024
About the resize method in nvvideoconvert/nvstreammux DeepStream SDK	8	2069	October 12, 2021
Collecting images with pyds.get_nvds_buf_surface DeepStream SDK nvbugs	23	3901	October 12, 2021
How to crop and resize different input sources separately? DeepStream SDK python , jetson , deepstream	11	755	January 16, 2024
Mask_param data interpret DeepStream SDK deepstream	33	67	March 4, 2025

Inference from very wide source

Related topics