Hi all,
Looking for help with my setup,
I have a brand new SuperMicro server with 4 RTX 2080 Ti
Ubuntu 18.04 + CUDA 10.0 + nvidia 410.79 (also tryied with 415 driver version)
The issue is utilization about 60%-70% and performance at P2
as i can see there is always “SW Power Cap : Active” on all gpu’s
also tryied 2080Ti from another vendor, the same problem “SW Power Cap”, power limit also not help
the server hadware is pretty top, including 2 PSU 2200W each
some nvidia-smi outputs:
==============NVSMI LOG==============
Timestamp : Thu Feb 28 15:11:57 2019
Driver Version : 410.79
CUDA Version : 10.0
Attached GPUs : 1
GPU 00000000:86:00.0
Product Name : GeForce RTX 2080 Ti
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-aa9365b0-01d8-884f-08fc-62515a8a21a8
Minor Number : 0
VBIOS Version : 90.02.17.00.C9
MultiGPU Board : No
Board ID : 0x8600
GPU Part Number : N/A
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x86
Device : 0x00
Domain : 0x0000
Device Id : 0x1E0410DE
Bus Id : 00000000:86:00.0
Sub System Id : 0x12AE10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 14000 KB/s
Rx Throughput : 45000 KB/s
Fan Speed : 61 %
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 10989 MiB
Used : 10868 MiB
Free : 121 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 8 MiB
Free : 248 MiB
Compute Mode : Default
Utilization
Gpu : 69 %
Memory : 53 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 83 C
GPU Shutdown Temp : 94 C
GPU Slowdown Temp : 91 C
GPU Max Operating Temp : 89 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 227.85 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 280.00 W
Clocks
Graphics : 1770 MHz
SM : 1770 MHz
Memory : 6800 MHz
Video : 1635 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7000 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 4540
Type : C
Name : /usr/bin/python3
Used GPU Memory : 10857 MiB
any ideas why the Power Cap occurs and reducing performance ?
appreciate for any help
Regards,
Ilya
nvidia-bug-report.log.gz (1.14 MB)