Inference throughput vs Graphics Clock speed

Automatic boost issue?
Max Customer Boost Clocks
Graphics : 1440 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A

Running the NGC retinanet with efficientnet backbone on A30.
With applications graphics clock showing 930 MHz, inference throughput is 813 fps.
However setting the applications graphics clock with nvidia-smi -ac affects the inference throughput as follows:
210 MHz = 128 fps
360 MHz = 216 fps
930 MHz = 545 fps
1260 MHz = 721 fps
1440 MHz = 813 fps

Then resetting the clocks with nvidia-smi -rac gives:

Applications Clocks
Graphics : 930 MHz
Memory : 1215 MHz

but the inference throughput then shows as 813 fps. Any ideas as to why the result is both ~813 and ~545 with the clock at 930 MHz? is there a bug in the displayed clock speeds or something maybe I don’t understand about the A30 clock (maybe it is boosting automatically?)

Sorry for the late response, our team will do the investigation and provide suggestions soon. Thanks

Can you use “nvidia-smi dmon” to monitor the device status while you run the above cases to compare the difference before and after “nvidia-smi -rac”?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi Fiona,
Thank you very much for your comments - the nvidia-smi dmon was the right test to run! It illustrates the issue exactly (shown below). The result shows that the A30 is autoboosting. When the clock is limited to 930 MHz the result is 544 fps. After nvidia-smi -rac the A30 boosts from a default clock of 930 MHz to 1440 MHz and the inference throughput result is 807 fps.

so at this point the question becomes is the A30 supposed to autoboost?

A30 GPU, DS 6.0.1, RetinaNet
NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5
GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC

sudo nvidia-smi -ac 1215,930
Applications clocks set to "(MEM 1215, SM 930)" for GPU 00000000:C4:00.0
All done.
dell@nvidia100:~$ nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    29    31    31     9     0     0     0  1215   930
    0    64    32    33    40     8     0     0  1215   930
    0    64    33    35   100    19     0     0  1215   930
    0    60    33    35   100    19     0     0  1215   930
    0    60    33    35   100    19     0     0  1215   930
    0    29    31    32     0     0     0     0  1215   930
    0    26    31    32     0     0     0     0  1215   240
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
    0    26    31    31     0     0     0     0  1215   210
^C    0    26    31    31     0     0     0     0  1215   210
dell@nvidia100:~$ sudo  nvidia-smi -rac
All done.
dell@nvidia100:~$ nvidia-smi dmon
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    26    32    32     0     0     0     0  1215   210
    0    26    32    32     0     0     0     0  1215   210
    0    27    32    32     0     0     0     0  1215   210
    0    26    32    32     0     0     0     0  1215   210
    0    26    32    32     0     0     0     0  1215   210
    0    26    32    32     0     0     0     0  1215   210
    0    26    32    32     0     0     0     0  1215   210
    0    29    32    32     6     0     0     0  1215   930
    0    50    32    33     5     0     0     0  1215   930
    0   135    38    37   100    28     0     0  1215  1440
    0   147    39    38   100    28     0     0  1215  1440
    0   136    39    38   100    28     0     0  1215  1440
    0    46    34    34     0     0     0     0  1215  1440
    0    46    34    33     0     0     0     0  1215  1440
    0    27    33    33     0     0     0     0  1215   240
    0    27    33    33     0     0     0     0  1215   210
    0    27    33    33     0     0     0     0  1215   210
    0    27    33    33     0     0     0     0  1215   210
    0    27    32    33     0     0     0     0  1215   210
    0    27    32    33     0     0     0     0  1215   210
    0    27    32    33     0     0     0     0  1215   210
    0    27    32    33     0     0     0     0  1215   210
    0    27    32    33     0     0     0     0  1215   210
^C    0    27    32    33     0     0     0     0  1215   210
dell@nvidia100:~$

this output shows the result of running two identical commands:

root@553843d5e7cd:/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstre
am-app-triton# /usr/src/tensorrt/bin/trtexec --batch=8 --loadEngine=/opt/nvidia/deepstream/deepstream-6.0/samples/trtis_model_repo/retinanet_resnet18_mod/1/saved.engine

and nvidia-smi -q shows the clock, after -rac, is saying 930 MHz

dell@nvidia100:~$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Wed Apr 20 17:42:54 2022
Driver Version                            : 495.29.05
CUDA Version                              : 11.5

Attached GPUs                             : 1
GPU 00000000:C4:00.0
    Product Name                          : NVIDIA A30
    Product Brand                         : NVIDIA
    Product Architecture                  : Ampere
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : Disabled
        Pending                           : Disabled
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1321021039728
    GPU UUID                              : GPU-8cd9376a-269a-57a3-64f4-a63744f7ae51
    Minor Number                          : 0
    VBIOS Version                         : 92.00.58.00.01
    MultiGPU Board                        : No
    Board ID                              : 0xc400
    GPU Part Number                       : 900-21001-0040-000
    Module ID                             : 0
    Inforom Version
        Image Version                     : 1001.0205.00.01
        OEM Object                        : 2.0
        ECC Object                        : 6.16
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0xC4
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x20B710DE
        Bus Id                            : 00000000:C4:00.0
        Sub System Id                     : 0x153210DE
        GPU Link Info
            PCIe Generation
                Max                       : 4
                Current                   : 4
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 24258 MiB
        Used                              : 0 MiB
        Free                              : 24258 MiB
    BAR1 Memory Usage
        Total                             : 32768 MiB
        Used                              : 1 MiB
        Free                              : 32767 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 384 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 30 C
        GPU Shutdown Temp                 : 100 C
        GPU Slowdown Temp                 : 97 C
        GPU Max Operating Temp            : 90 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 30 C
        Memory Max Operating Temp         : 95 C
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 26.62 W
        Power Limit                       : 165.00 W
        Default Power Limit               : 165.00 W
        Enforced Power Limit              : 165.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 165.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 1215 MHz
        Video                             : 585 MHz
    Applications Clocks
        Graphics                          : 930 MHz
        Memory                            : 1215 MHz
    Default Applications Clocks
        Graphics                          : 930 MHz
        Memory                            : 1215 MHz
    Max Clocks
        Graphics                          : 1440 MHz
        SM                                : 1440 MHz
        Memory                            : 1215 MHz
        Video                             : 1305 MHz
    Max Customer Boost Clocks
        Graphics                          : 1440 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 681.250 mV
    Processes                             : None

dell@nvidia100:~$