nvidia-smi show H100 run at full load (100% GPU util), power consumption only ~110w, but max capacity is 700w, is this normal?
some more details:
nvidia-smi -q
root@node13:~# nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Tue Jul 16 11:05:16 2024
Driver Version : 535.154.05
CUDA Version : 12.2
Attached GPUs : 8
GPU 00000000:18:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 0
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 2
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:10.701
Latest Duration : 92485 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x18
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:18:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 235292 KB/s
Rx Throughput : 1554507 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 66839 MiB
Free : 14168 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 5 MiB
Free : 131067 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 28 C
GPU T.Limit Temp : 58 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 37 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 110.69 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 925.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8464
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 66830 MiB
GPU 00000000:2A:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 1
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 4
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:10.586
Latest Duration : 94166 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x2A
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:2A:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 4
Replay Number Rollovers : 0
Tx Throughput : 1160101 KB/s
Rx Throughput : 11340863 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 71401 MiB
Free : 9606 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 1877 MiB
Free : 129195 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 31 C
GPU T.Limit Temp : 55 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 36 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 119.92 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 935.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8465
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 71392 MiB
GPU 00000000:3A:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 2
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:10.023
Latest Duration : 91434 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x3A
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:3A:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 1
Replay Number Rollovers : 0
Tx Throughput : 208089 KB/s
Rx Throughput : 1505625 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 70917 MiB
Free : 10090 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 5 MiB
Free : 131067 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 32 C
GPU T.Limit Temp : 55 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 40 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 112.36 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 935.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8466
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 70908 MiB
GPU 00000000:5D:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 3
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 3
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:16.050
Latest Duration : 128204 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x5D
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:5D:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 317777 KB/s
Rx Throughput : 1984734 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 77299 MiB
Free : 3708 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 2085 MiB
Free : 128987 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 28 C
GPU T.Limit Temp : 59 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 36 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 113.84 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 935.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8467
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 77290 MiB
GPU 00000000:9A:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 4
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 6
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:10.688
Latest Duration : 92496 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x9A
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:9A:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 6
Replay Number Rollovers : 0
Tx Throughput : 1387984 KB/s
Rx Throughput : 13858941 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 80645 MiB
Free : 362 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 5 MiB
Free : 131067 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 27 C
GPU T.Limit Temp : 59 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 36 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 109.72 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 925.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8469
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 80636 MiB
GPU 00000000:AB:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 5
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 8
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:19.040
Latest Duration : 89437 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0xAB
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:AB:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 5
Replay Number Rollovers : 0
Tx Throughput : 1169238 KB/s
Rx Throughput : 11799937 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 54637 MiB
Free : 26370 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 1877 MiB
Free : 129195 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 31 C
GPU T.Limit Temp : 56 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 37 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 112.70 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 935.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8470
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 54628 MiB
GPU 00000000:BA:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 6
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 5
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:11.744
Latest Duration : 125255 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0xBA
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:BA:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 263601 KB/s
Rx Throughput : 1608136 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 75633 MiB
Free : 5374 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 5 MiB
Free : 131067 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 31 C
GPU T.Limit Temp : 56 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 37 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 115.33 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 935.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8471
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 75624 MiB
GPU 00000000:DB:00.0
Product Name : NVIDIA H100 80GB HBM3
Product Brand : NVIDIA
Product Architecture : Hopper
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : Disabled
Pending : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : ******
GPU UUID : ******
Minor Number : 7
VBIOS Version : 96.00.89.00.01
MultiGPU Board : No
Board ID : ******
Board Part Number : ******
GPU Part Number : ******
FRU Part Number : N/A
Module ID : 7
Inforom Version
Image Version : G520.0200.00.05
OEM Object : 2.1
ECC Object : 7.16
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : 2024/07/14 16:14:10.912
Latest Duration : 126188 us
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : 535.154.05
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : No
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0xDB
Device : 0x00
Domain : 0x0000
Device Id : 0x233010DE
Bus Id : 00000000:DB:00.0
Sub System Id : 0x16C110DE
GPU Link Info
PCIe Generation
Max : 5
Current : 5
Device Current : 5
Device Max : 5
Host Max : 5
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 11
Replay Number Rollovers : 0
Tx Throughput : 1812390 KB/s
Rx Throughput : 1477725 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : N/A
Performance State : P0
Clocks Event Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 81559 MiB
Reserved : 551 MiB
Used : 73259 MiB
Free : 7748 MiB
BAR1 Memory Usage
Total : 131072 MiB
Used : 2085 MiB
Free : 128987 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : 0
SRAM Uncorrectable : 0
DRAM Correctable : 0
DRAM Uncorrectable : 0
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 2560 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 29 C
GPU T.Limit Temp : 58 C
GPU Shutdown T.Limit Temp : -8 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
Memory Current Temp : 35 C
Memory Max Operating T.Limit Temp : 0 C
GPU Power Readings
Power Draw : 113.51 W
Current Power Limit : 700.00 W
Requested Power Limit : 700.00 W
Default Power Limit : 700.00 W
Min Power Limit : 200.00 W
Max Power Limit : 700.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Default Applications Clocks
Graphics : 1980 MHz
Memory : 2619 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1980 MHz
SM : 1980 MHz
Memory : 2619 MHz
Video : 1545 MHz
Max Customer Boost Clocks
Graphics : 1980 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 925.000 mV
Fabric
State : Completed
Status : Success
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 8472
Type : C
Name : /home/user/anaconda3/envs/llamafactory/bin/python
Used GPU Memory : 73250 MiB