Nvdec usage reaches a maximum of 50% in deepstream application

PC: RTX 2070 Super
Deepstream: 6.2
Driver Version: 525.105.17
Docker image: deepstream:6.2-devel

I am working on a the sample app deepstream-test3 in C++. The PERF_MODE is enabled.
When checking nvdec usage using “nvidia-smi dmon” I noticed it only reaches a maximum of 50% even if the number of streams increase. This behavior is experienced in my RTX 2070 Super card.

When I test in RTX 3060Ti I am able to get 100% nvdec utilization as expected.

I have raised this issue in deepstream forums but was redirected to here, since the sample app works fine on other cards (Nvdec usage only reaches a maximum of 50% in deepstream application).

Would like to get any insights into why nvdec usage caps out at 50%.

Hi.
Do you see this issue with Video Codec SDK sample application as well?
In the meantime I will try to see if we can reproduce this issue internally with the information you have provided.

Hi @mandar_godse
Sorry for the late reply, I am able to reproduce the same issue in the Video Codec SDK sample app as well.

My PC, driver version and docker image are as I originally mentioned above (Ubuntu 20.04 Host)

I downloaded SDK version 11.1.5 from here (https://developer.nvidia.com/video-codec-sdk-archive)
Installed required dependencies mainly ffmpeg inside the docker container and updated the environment variables accordingly. Afterwards I am able to compile sample apps by following instructions in the sdk readme.

I tried to run the following app

Samples/build/AppDecode/AppDecPerf

with the command

./AppDecPerf -i /test.mp4

Following is the result

GPU in use: NVIDIA GeForce RTX 2070 SUPER
[INFO ][10:53:17] Media format: QuickTime / MOV (mov,mp4,m4a,3gp,3g2,mj2)
[INFO ][10:53:17] Media format: QuickTime / MOV (mov,mp4,m4a,3gp,3g2,mj2)
Session Initialization Time: 16 ms 
Session Initialization Time: 17 ms 
[INFO ][10:53:17] Video Input Information
        Codec        : AVC/H.264
        Frame rate   : 0/0 = -nan fps
        Sequence     : Progressive
        Coded size   : [2592, 1952]
        Display area : [0, 0, 2592, 1944]
        Chroma       : YUV 420
        Bit depth    : 8
Video Decoding Params:
        Num Surfaces : 2
        Crop         : [0, 0, 0, 0]
        Resize       : 2592x1952
        Deinterlace  : Weave

[INFO ][10:53:17] Video Input Information
        Codec        : AVC/H.264
        Frame rate   : 0/0 = -nan fps
        Sequence     : Progressive
        Coded size   : [2592, 1952]
        Display area : [0, 0, 2592, 1944]
        Chroma       : YUV 420
        Bit depth    : 8
Video Decoding Params:
        Num Surfaces : 2
        Crop         : [0, 0, 0, 0]
        Resize       : 2592x1952
        Deinterlace  : Weave

Session Deinitialization Time: 17 ms 
Session Deinitialization Time: 16 ms 
Total Frames Decoded=2340 FPS = 388.996

However I face the same issue with nvidia-smi dmon showing as below

# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0     19     46      -     6     11      0      0    405    300 
    0     20     46      -     1      9      0      0    405    300 
    0     45     47      -     5     11      0      0   6801   1605 
    0     70     49      -    10      6      0     19   6801   2010 
    0     72     49      -     5      7      0     50   6801   2010 
    0     72     49      -     5      7      0     50   6801   2010 
    0     71     49      -     5      7      0     50   6801   2010 
    0     73     49      -     5      7      0     50   6801   2010 
    0     73     49      -     5      7      0     50   6801   2010 
    0     64     49      -     5      5      0     30   7000   2010 
    0     23     47      -     0      0      0      0    810    510 
    0     21     47      -     1      4      0      0    810    360 

In this case I am running just a single instance of the app and it seems to bottleneck at 50, I have also tried running 2 instances of the app at the same time using tmux, with the same bottleneck at 50, in this case the FPS is halved what I initally got for a single instance.

Hi @mandar_godse
Any update for this issue.