FPS drop on a high resolution video in gstreamer pipeline using gstreamer

Hello,

I am using detectnet from jetson-inference GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. on a stream received from rtsp camera. I have modified the gstCamera files in a way that it can create two streams, one at a lower resolution (640 x 360) and another at full resolution (3072 x 1728).

I want to do some OCR based on object detections and thus, there is a need for two separate frames. The detectnet will run on lower res frames, but I would want to scale the detections to the higher resolution frames and do OCR.

For this, I basically changed gstCamera.h to receive width and height while creation.

Then, I test it on video files and have written gstVideo files similar to gstCamera.

My question is, that even though I do not process anything on the high resolution video, the FPS on running the executable varies from 18-20.

Is there any better way of tackling this issue so that there is not drop in FPS? I really cannot use a lower resolution feed as the text I am trying to recognize is not clearly visible.

EDIT: taken down the code. Can contact me if someone wants any help.

hi bhargavK,
Do you run with jetson_clocks.sh? Also do you see better performance with lower resolution 1920x1080 instead of 3072x1728

Hi DaneLLL,

Thank you for your reply.

I did run with jetson_clocks.sh. And it was on TX1 if I missed mentioning before.

There is a little improvement if I use 1920x1080, it is nearly 20-22 FPS.

But again, this is on the stream where detectnet is not running. The detetnet runs at ~10 FPS when the low-resolution stream is only fed. But I want the hi-res data available too, in sync with my low-res stream.

Hi bhargavK,
Please check tegrastats to see where the bottleneck is.

Sure, I think I had checked them. Currently, I am away from the Jetson and can’t really access it remotely, but I had noticed the CPU usage was very high when I tried to have both high and low res videos in the gst-video.

I will post them tomorrow when I get back to work.

Many thanks!

Hello,

Here is the information from tegrastats.

When I do not use the hi-res video feed at all, only 640x360 video, I get 30 FPS and following stats.

RAM 1476/3983MB (lfb 46x4MB) cpu [14%,8%,2%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [16%,7%,2%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [30%,4%,0%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [27%,2%,15%,1%]@1734 EMC 2%@1600 APE 25 GR3D 33%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [7%,18%,0%,2%]@1734 EMC 2%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [2%,12%,1%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@998
RAM 1617/3983MB (lfb 46x4MB) cpu [22%,10%,26%,11%]@1734 EMC 4%@1600 APE 25 NVDEC 716 GR3D 38%@998     <--- gst-video starts
RAM 1766/3983MB (lfb 46x4MB) cpu [31%,25%,43%,29%]@1734 EMC 11%@1600 APE 25 NVDEC 563 GR3D 31%@998
RAM 1771/3983MB (lfb 46x4MB) cpu [32%,24%,19%,28%]@1734 EMC 15%@1600 APE 25 NVDEC 716 GR3D 17%@998
RAM 1771/3983MB (lfb 46x4MB) cpu [26%,26%,22%,16%]@1734 EMC 18%@1600 APE 25 NVDEC 345 GR3D 54%@998
RAM 1776/3983MB (lfb 46x4MB) cpu [26%,31%,23%,28%]@1734 EMC 19%@1600 APE 25 NVDEC 716 GR3D 30%@998
RAM 1805/3983MB (lfb 46x4MB) cpu [25%,29%,41%,41%]@1734 EMC 21%@1600 APE 25 NVDEC 345 GR3D 32%@998
RAM 1806/3983MB (lfb 46x4MB) cpu [38%,28%,37%,28%]@1734 EMC 23%@1600 APE 25 NVDEC 345 GR3D 37%@998
RAM 1807/3983MB (lfb 46x4MB) cpu [35%,36%,26%,30%]@1734 EMC 24%@1600 APE 25 NVDEC 345 GR3D 25%@998
RAM 1807/3983MB (lfb 46x4MB) cpu [30%,27%,22%,18%]@1734 EMC 25%@1600 APE 25 NVDEC 345 GR3D 41%@998
RAM 1807/3983MB (lfb 46x4MB) cpu [31%,18%,32%,19%]@1734 EMC 25%@1600 APE 25 NVDEC 345 GR3D 48%@998
RAM 1807/3983MB (lfb 46x4MB) cpu [27%,18%,30%,29%]@1734 EMC 25%@1600 APE 25 NVDEC 345 GR3D 24%@998
RAM 1808/3983MB (lfb 46x4MB) cpu [19%,16%,31%,22%]@1734 EMC 25%@1600 APE 25 NVDEC 716 GR3D 14%@998    <--- gst-video stops
RAM 1484/3983MB (lfb 46x4MB) cpu [10%,14%,17%,14%]@1734 EMC 19%@1600 APE 25 GR3D 0%@998
RAM 1484/3983MB (lfb 46x4MB) cpu [4%,1%,6%,4%]@1734 EMC 14%@1600 APE 25 GR3D 0%@998
RAM 1484/3983MB (lfb 46x4MB) cpu [2%,0%,0%,3%]@1734 EMC 10%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [2%,0%,0%,2%]@1734 EMC 7%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [1%,1%,0%,1%]@1734 EMC 6%@1600 APE 25 GR3D 0%@998
RAM 1476/3983MB (lfb 46x4MB) cpu [3%,2%,1%,1%]@1734 EMC 5%@1600 APE 25 GR3D 0%@998

When I use hi res 3072x1728 feed along with the lower res, the FPS: ~18-20 FPS. Following is the stats.

RAM 1476/3983MB (lfb 46x4MB) cpu [0%,0%,0%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@998
RAM 2008/3983MB (lfb 46x4MB) cpu [53%,36%,57%,63%]@1734 EMC 11%@1600 APE 25 NVDEC 716 GR3D 47%@998    <--- gst-video starts
RAM 2071/3983MB (lfb 46x4MB) cpu [22%,28%,54%,88%]@1734 EMC 17%@1600 APE 25 NVDEC 716 GR3D 69%@998
RAM 2078/3983MB (lfb 46x4MB) cpu [29%,68%,37%,55%]@1734 EMC 21%@1600 APE 25 NVDEC 345 GR3D 14%@998
RAM 2081/3983MB (lfb 46x4MB) cpu [54%,62%,52%,36%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 29%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [77%,50%,38%,52%]@1734 EMC 26%@1600 APE 25 NVDEC 716 GR3D 22%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [56%,49%,46%,26%]@1734 EMC 25%@1600 APE 25 NVDEC 716 GR3D 8%@998
RAM 2089/3983MB (lfb 46x4MB) cpu [35%,21%,56%,57%]@1734 EMC 25%@1600 APE 25 NVDEC 716 GR3D 11%@998
RAM 2090/3983MB (lfb 46x4MB) cpu [22%,38%,56%,57%]@1734 EMC 25%@1600 APE 25 NVDEC 716 GR3D 8%@998
RAM 2091/3983MB (lfb 46x4MB) cpu [25%,52%,57%,44%]@1734 EMC 25%@1600 APE 25 NVDEC 272 GR3D 10%@998
RAM 2092/3983MB (lfb 46x4MB) cpu [46%,49%,46%,32%]@1734 EMC 25%@1600 APE 25 NVDEC 678 GR3D 7%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [69%,35%,25%,44%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 7%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [58%,51%,43%,44%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 9%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [28%,27%,37%,75%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 32%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [23%,36%,51%,72%]@1734 EMC 24%@1600 APE 25 NVDEC 345 GR3D 5%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [35%,58%,62%,28%]@1734 EMC 24%@1600 APE 25 NVDEC 345 GR3D 9%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [58%,44%,47%,23%]@1734 EMC 23%@1600 APE 25 NVDEC 627 GR3D 31%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [40%,62%,39%,42%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 9%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [18%,73%,37%,60%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 9%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [30%,53%,42%,54%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 28%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [60%,21%,59%,35%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 27%@998
RAM 2087/3983MB (lfb 46x4MB) cpu [31%,15%,84%,54%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 9%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [42%,26%,57%,56%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 34%@998
RAM 2088/3983MB (lfb 46x4MB) cpu [50%,23%,56%,57%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 37%@998
RAM 2038/3983MB (lfb 46x4MB) cpu [45%,24%,49%,62%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 35%@998    <--- gst-video ends
RAM 1478/3983MB (lfb 46x4MB) cpu [4%,10%,5%,20%]@1734 EMC 17%@1600 APE 25 GR3D 0%@998
RAM 1470/3983MB (lfb 46x4MB) cpu [0%,0%,0%,4%]@1734 EMC 13%@1600 APE 25 GR3D 0%@998
RAM 1470/3983MB (lfb 46x4MB) cpu [0%,0%,2%,3%]@1734 EMC 9%@1600 APE 25 GR3D 4%@998
RAM 1470/3983MB (lfb 46x4MB) cpu [0%,0%,2%,1%]@1734 EMC 7%@1600 APE 25 GR3D 0%@998
RAM 1470/3983MB (lfb 46x4MB) cpu [0%,1%,1%,1%]@1734 EMC 5%@1600 APE 25 GR3D 0%@998
RAM 1470/3983MB (lfb 46x4MB) cpu [0%,1%,2%,1%]@1734 EMC 4%@1600 APE 25 GR3D 4%@998

When I use hi res 1920x1080 feed along with the lower res, the FPS: ~20-24 FPS. Following is the stats.

RAM 1475/3983MB (lfb 46x4MB) cpu [0%,0%,0%,0%]@1734 EMC 5%@1600 APE 25 GR3D 0%@998
RAM 1475/3983MB (lfb 46x4MB) cpu [1%,4%,5%,0%]@1734 EMC 4%@1600 APE 25 GR3D 0%@998
RAM 1475/3983MB (lfb 46x4MB) cpu [5%,6%,3%,1%]@1734 EMC 4%@1600 APE 25 GR3D 0%@998
RAM 1475/3983MB (lfb 46x4MB) cpu [1%,1%,2%,0%]@1734 EMC 3%@1600 APE 25 GR3D 0%@998
RAM 1959/3983MB (lfb 46x4MB) cpu [65%,40%,40%,39%]@1734 EMC 11%@1600 APE 25 NVDEC 563 GR3D 51%@998    <--- gst-video starts
RAM 1966/3983MB (lfb 46x4MB) cpu [35%,57%,36%,28%]@1734 EMC 16%@1600 APE 25 NVDEC 396 GR3D 61%@998
RAM 1995/3983MB (lfb 46x4MB) cpu [35%,56%,34%,39%]@1734 EMC 21%@1600 APE 25 NVDEC 345 GR3D 4%@998
RAM 1999/3983MB (lfb 46x4MB) cpu [45%,40%,29%,29%]@1734 EMC 22%@1600 APE 25 NVDEC 716 GR3D 20%@998
RAM 2005/3983MB (lfb 46x4MB) cpu [36%,43%,30%,48%]@1734 EMC 24%@1600 APE 25 NVDEC 716 GR3D 19%@998
RAM 2006/3983MB (lfb 46x4MB) cpu [44%,32%,32%,47%]@1734 EMC 26%@1600 APE 25 NVDEC 345 GR3D 39%@998
RAM 2008/3983MB (lfb 46x4MB) cpu [27%,39%,50%,48%]@1734 EMC 27%@1600 APE 25 NVDEC 716 GR3D 16%@998
RAM 2008/3983MB (lfb 46x4MB) cpu [64%,41%,41%,26%]@1734 EMC 28%@1600 APE 25 NVDEC 716 GR3D 42%@998
RAM 2009/3983MB (lfb 46x4MB) cpu [46%,34%,41%,24%]@1734 EMC 29%@1600 APE 25 NVDEC 678 GR3D 31%@998
RAM 2009/3983MB (lfb 46x4MB) cpu [43%,28%,48%,44%]@1734 EMC 29%@1600 APE 25 NVDEC 345 GR3D 24%@998
RAM 2010/3983MB (lfb 46x4MB) cpu [52%,33%,34%,37%]@1734 EMC 29%@1600 APE 25 NVDEC 345 GR3D 20%@998
RAM 2011/3983MB (lfb 46x4MB) cpu [31%,39%,35%,23%]@1734 EMC 27%@1600 APE 25 NVDEC 345 GR3D 24%@998    <--- gst-video stops
RAM 1475/3983MB (lfb 46x4MB) cpu [22%,16%,14%,10%]@1734 EMC 21%@1600 APE 25 GR3D 0%@998
RAM 1475/3983MB (lfb 46x4MB) cpu [1%,6%,2%,2%]@1734 EMC 15%@1600 APE 25 GR3D 0%@998
RAM 1475/3983MB (lfb 46x4MB) cpu [1%,0%,0%,1%]@1734 EMC 10%@1600 APE 25 GR3D 0%@998
RAM 1475/3983MB (lfb 46x4MB) cpu [0%,0%,0%,0%]@1734 EMC 8%@1600 APE 25 GR3D 0%@998

I don’t see any major surprises here.

Sorry for starting this thread in TX2 forum vs TX1 forum.

You can use the test file for this script that is uploaded here https://drive.google.com/file/d/1LVBfaFIk-4_csxgk30N_NRk5fdVwo0hA/view?usp=sharing.

It can be run using:

./gst-video --input /path/to/test.mp4

Hi bhargavK,
The bottleneck should be memcpy. The flow is like:
nvvidconv → (copy dmabuf to CPU gstbuf) ->clockoverlay → appsink (copy CPU gstbuf to ring buffer)

It should be OK in low resolution, but possibly hits limitation i n high resolution.

Please consider MMAPIs and refer to the two samples:

tegra_multimedia_api\samples

tegra_multimedia_api\samples\02_video_dec_cuda
tegra_multimedia_api\samples\04_video_dec_trt

2_video_dec_cuda
tegra_multimedia_api\samples

tegra_multimedia_api\samples\02_video_dec_cuda
tegra_multimedia_api\samples\04_video_dec_trt

4_video_dec_trt

Hi Dane LLL,

Thanks for your reply. I will look through the examples from MMAPI.

Thank you for your insights DaneLLL, based on

I reduce the size to 2560x1440 instead of 3072x1728 and the size of ring buffer to 8 from 16. I disabled the display because ultimately, I will be running the code remotely. I just observe the frame number based on my code and find that ~1 frame is missed every 30 frames on an average 9hopefully, it is a correct way to measure!). Thus, I think I should be fine processing at this resolution.

Once again, thanks!