QP4000 performance sub-optimal

the problem seems fairly basic : i’d like to create thumbnails from incoming video in the shortest time possible, and i’m trying to do this by offloading processing to an nvidia gpu.

while i run ffmpeg, i’m monitoring the gpu usage with the nvidia-smi utility. gpu usage never goes above 15% and the amount of time to encode the thumbnails with gpu is only 10% less than the time required without the gpu. these performance levels are very disappointing.

my question : am i going about this the wrong way (and if so, how should i go about it), or is this gpu performance ‘normal’/‘reasonable’ ?

SYSTEM INFORMATION

the machine is a desktop pc running windows 10, 8gb ram, intel i7-7700. the gpu is an nvidia quadro pro 4000 with cuda 11.4 installed. ffmpeg is version N-101372-gb5cb8c8767-g2fc309e699+4 (2021) running under mingw, with --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-libnpp --enable-nvdec and --enable-nvenc .

a typical ffmpeg command line i’ve used is :

 1 ffmpeg -hide_banner \
 2     -init_hw_device cuda=cuda:0 -filter_hw_device cuda \
 3     -hwaccel_output_format cuda \
 4     -i "$infile" \
 5     -vf "hwupload_cuda,scale_npp=w=200:h=150:format=yuv420p:interp_algo=lanczos,fps=1/1,hwdownload,format=yuv420p" \
 6     -y "$outdir/%08d.png"

i’ve varied the above by supplementing some cuda-related parameters according to posts i’ve read here on stackoverflow and on the nvidia transcoding guide, but haven’t been able to improve performance. adding any of -hwaccel cuda, -hwaccel cuvid, -hwaccel nvenc at the beginning of line 3 results in the error : Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0'

any pointers appreciated !

Can you clarify what the GPU is? Quadro 4000 Pro doesn’t mean anything to me. I am aware of the Turing-based Quadro RTX 4000 and also the Fermi-based Quadro 4000 from about ten years ago. However, the latter is no longer supported by modern versions of CUDA (including CUDA 11.4).

What is the output of nvidia-smi -q?

hi @njuffa

many thanks for your reply.

i think the gpu is pascal-based.

here is the output of nvidia-smi -q that you asked for :

==============NVSMI LOG==============

Timestamp                                 : Thu Aug  5 15:55:26 2021
Driver Version                            : 471.41
CUDA Version                              : 11.4

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : Quadro P4000
    Product Brand                         : Quadro
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : N/A
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : WDDM
        Pending                           : WDDM
    Serial Number                         : 0320518021395
    GPU UUID                              : GPU-e40fbe22-c38d-df0f-df28-63589735ce43
    Minor Number                          : N/A
    VBIOS Version                         : 86.04.56.00.0b
    MultiGPU Board                        : No
    Board ID                              : 0x100
    GPU Part Number                       : 900-5G410-2750-000
    Module ID                             : 0
    Inforom Version
        Image Version                     : G410.0501.00.03
        OEM Object                        : 1.1
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x01
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1BB110DE
        Bus Id                            : 00000000:01:00.0
        Sub System Id                     : 0x11A310DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 4000 KB/s
        Rx Throughput                     : 66000 KB/s
    Fan Speed                             : 46 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 8192 MiB
        Used                              : 955 MiB
        Free                              : 7237 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 2 MiB
        Free                              : 254 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 24 %
        Memory                            : 24 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
        Aggregate
            Single Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 34 C
        GPU Shutdown Temp                 : 96 C
        GPU Slowdown Temp                 : 93 C
        GPU Max Operating Temp            : N/A
        GPU Target Temperature            : 83 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 12.10 W
        Power Limit                       : 105.00 W
        Default Power Limit               : 105.00 W
        Enforced Power Limit              : 105.00 W
        Min Power Limit                   : 60.00 W
        Max Power Limit                   : 105.00 W
    Clocks
        Graphics                          : 25 MHz
        SM                                : 25 MHz
        Memory                            : 405 MHz
        Video                             : 544 MHz
    Applications Clocks
        Graphics                          : 1202 MHz
        Memory                            : 3802 MHz
    Default Applications Clocks
        Graphics                          : 1202 MHz
        Memory                            : 3802 MHz
    Max Clocks
        Graphics                          : 1708 MHz
        SM                                : 1708 MHz
        Memory                            : 3802 MHz
        Video                             : 1544 MHz
    Max Customer Boost Clocks
        Graphics                          : 1708 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 1220
            Type                          : C+G
            Name                          : Insufficient Permissions
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 6444
            Type                          : C+G
            Name                          : C:\Windows\explorer.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 6688
            Type                          : C+G
            Name                          : C:\Windows\SystemApps\MicrosoftWindows.Client.CBS_cw5n1h2txyewy\InputApp\TextInputHost.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 7272
            Type                          : C+G
            Name                          : C:\Program Files\WindowsApps\Microsoft.ZuneVideo_10.20112.10111.0_x64__8wekyb3d8bbwe\Video.UI.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 7356
            Type                          : C+G
            Name                          : C:\Windows\SystemApps\Microsoft.Windows.StartMenuExperienceHost_cw5n1h2txyewy\StartMenuExperienceHost.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 7716
            Type                          : C+G
            Name                          : C:\Windows\SystemApps\Microsoft.Windows.Search_cw5n1h2txyewy\SearchApp.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 8532
            Type                          : C+G
            Name                          : C:\Windows\SystemApps\Microsoft.LockApp_cw5n1h2txyewy\LockApp.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 10212
            Type                          : C+G
            Name                          : C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 10472
            Type                          : C+G
            Name                          : C:\Windows\SystemApps\ShellExperienceHost_cw5n1h2txyewy\ShellExperienceHost.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 12176
            Type                          : C+G
            Name                          : C:\Program Files\WindowsApps\Microsoft.Windows.Photos_2020.20120.4004.0_x64__8wekyb3d8bbwe\Microsoft.Photos.exe
            Used GPU Memory               : Not available in WDDM driver model
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 12256
            Type                          : C+G
            Name                          : C:\Windows\ImmersiveControlPanel\SystemSettings.exe
            Used GPU Memory               : Not available in WDDM driver model


Yes, a Quadro P4000 is a Pascal-based GPU and as such is supported by recent versions of CUDA. I have no experience with transcoding, but maybe precise knowledge of the GPU type allows someone else to recommend a further course of action.