nvidia-smi slow process listing(not persistance related)

Hello,
I am using dgx-1 server with 8 v100 cards. All were in the pre-configured settings according to the sys admin. The issue is when executing nvidia-smi, it will get stuck for a few seconds during the process listing phase. I know about the issue regarding driver initialization and persistance daemon but I don’t think this is what’s happening here. Persistance daemon is running and persistance mode is enabled for all gpus. In addition,

watch -n 0.1 nvidia-smi --query-gpu=index,temperature.gpu,utilization.gpu,memory.total,memory.used --format=csv

completes very fast since it skips process listing.

What could be an issue? I attached output of nvidia-smi -q

==============NVSMI LOG==============

Timestamp                           : Mon Mar  9 12:03:48 2020
Driver Version                      : 440.33.01
CUDA Version                        : 10.2

Attached GPUs                       : 8
GPU 00000000:06:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518187498
    GPU UUID                        : GPU-fedbf316-485a-dbc0-bbf0-1f37942a7f5e
    Minor Number                    : 0
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x600
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x06
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:06:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 0 MiB
        Free                        : 32510 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 2 MiB
        Free                        : 32766 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 30 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 29 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 43.75 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 135 MHz
        SM                          : 135 MHz
        Memory                      : 810 MHz
        Video                       : 555 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

GPU 00000000:07:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518189940
    GPU UUID                        : GPU-15e33744-f3d1-43df-0baa-d0e66f17a34d
    Minor Number                    : 1
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x700
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x07
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:07:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 5000 KB/s
        Rx Throughput               : 17000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 21929 MiB
        Free                        : 10581 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 8 MiB
        Free                        : 32760 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 100 %
        Memory                      : 54 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 43 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 42 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 64.39 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 1222 MHz
        SM                          : 1222 MHz
        Memory                      : 810 MHz
        Video                       : 1095 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 68784
            Type                    : C
            Name                    : python
            Used GPU Memory         : 21909 MiB

GPU 00000000:0A:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518188082
    GPU UUID                        : GPU-9a00dcb4-1698-ed08-b956-0ef0ff5266f0
    Minor Number                    : 2
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0xa00
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x0A
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:0A:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 702000 KB/s
        Rx Throughput               : 4321000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 22547 MiB
        Free                        : 9963 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 8 MiB
        Free                        : 32760 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 100 %
        Memory                      : 10 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 47 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 47 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 152.48 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 1080 MHz
        SM                          : 1080 MHz
        Memory                      : 810 MHz
        Video                       : 967 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 74548
            Type                    : C
            Name                    : python
            Used GPU Memory         : 22521 MiB

GPU 00000000:0B:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518188568
    GPU UUID                        : GPU-aef7b265-fced-d500-14f2-a8bdb4b81651
    Minor Number                    : 3
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0xb00
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x0B
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:0B:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 0 MiB
        Free                        : 32510 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 2 MiB
        Free                        : 32766 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 31 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 29 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 40.79 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 135 MHz
        SM                          : 135 MHz
        Memory                      : 810 MHz
        Video                       : 555 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

GPU 00000000:85:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518190041
    GPU UUID                        : GPU-61373a02-b673-63c0-7ae1-5224d26985ab
    Minor Number                    : 4
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x8500
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x85
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:85:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 0 MiB
        Free                        : 32510 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 2 MiB
        Free                        : 32766 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 34 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 32 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 42.23 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 135 MHz
        SM                          : 135 MHz
        Memory                      : 810 MHz
        Video                       : 555 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

GPU 00000000:86:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518188499
    GPU UUID                        : GPU-b0d0da4f-2735-5b1e-2913-6b8358b3eaf9
    Minor Number                    : 5
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x8600
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x86
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:86:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 4000 KB/s
        Rx Throughput               : 11000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 15649 MiB
        Free                        : 16861 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 8 MiB
        Free                        : 32760 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 100 %
        Memory                      : 20 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 46 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 48 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 70.23 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 1102 MHz
        SM                          : 1102 MHz
        Memory                      : 810 MHz
        Video                       : 990 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 63882
            Type                    : C
            Name                    : python
            Used GPU Memory         : 15629 MiB

GPU 00000000:89:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324218156880
    GPU UUID                        : GPU-e33e76d1-b23d-f012-f587-691d1f76271d
    Minor Number                    : 6
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x8900
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x89
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:89:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 24837 MiB
        Free                        : 7673 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 8 MiB
        Free                        : 32760 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 42 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 42 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 107.20 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 1177 MHz
        SM                          : 1177 MHz
        Memory                      : 810 MHz
        Video                       : 1057 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 55362
            Type                    : C
            Name                    : python
            Used GPU Memory         : 24825 MiB

GPU 00000000:8A:00.0
    Product Name                    : Tesla V100-SXM2-32GB-LS
    Product Brand                   : Tesla
    Display Mode                    : Enabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324518189937
    GPU UUID                        : GPU-44e3f42a-a2e9-4b23-d742-862c08620867
    Minor Number                    : 7
    VBIOS Version                   : 88.00.72.00.03
    MultiGPU Board                  : No
    Board ID                        : 0x8a00
    GPU Part Number                 : 692-2G503-0280-200
    Inforom Version
        Image Version               : G503.0280.00.02
        OEM Object                  : 1.1
        ECC Object                  : 5.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x8A
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1DB510DE
        Bus Id                      : 00000000:8A:00.0
        Sub System Id               : 0x130810DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 444000 KB/s
        Rx Throughput               : 20000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 32510 MiB
        Used                        : 25649 MiB
        Free                        : 6861 MiB
    BAR1 Memory Usage
        Total                       : 32768 MiB
        Used                        : 8 MiB
        Free                        : 32760 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 13 %
        Memory                      : 3 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending Page Blacklist      : No
    Temperature
        GPU Current Temp            : 41 C
        GPU Shutdown Temp           : 90 C
        GPU Slowdown Temp           : 87 C
        GPU Max Operating Temp      : 83 C
        Memory Current Temp         : 41 C
        Memory Max Operating Temp   : 85 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 56.99 W
        Power Limit                 : 163.00 W
        Default Power Limit         : 163.00 W
        Enforced Power Limit        : 163.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 250.00 W
    Clocks
        Graphics                    : 1200 MHz
        SM                          : 1200 MHz
        Memory                      : 810 MHz
        Video                       : 1080 MHz
    Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Default Applications Clocks
        Graphics                    : 817 MHz
        Memory                      : 810 MHz
    Max Clocks
        Graphics                    : 1440 MHz
        SM                          : 1440 MHz
        Memory                      : 810 MHz
        Video                       : 1290 MHz
    Max Customer Boost Clocks
        Graphics                    : 1440 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 55362
            Type                    : C
            Name                    : python
            Used GPU Memory         : 25637 MiB

Same issue here, also with a new DGX1 just received yesterday. My NVSMI LOG is identical, including driver and Cuda Version.

Any tips would be appreciated.

You could refer to the link below to get more insight.

I encountered a very similar issue, and my initial approach was to reinstall the NVIDIA compute utilities using the command:

sudo apt install --reinstall nvidia-compute-utils-xxx

where ‘xxx’ represents the version number of your NVIDIA driver. However, this method is unfortunately not a permanent solution as the issue recurs after every reboot, necessitating repeated reinstallation.

1 Like