GTX1070 performance issue

Dear all,

I’m preparing CUDA environment on my laptop to learn deep learning networks. It have a GTX 1070 GPU card. First I installed ubuntu14.04 into my laptop and NVIDIA driver 375.66 also installed(from dot-run file, it was downloaded from NVIDIA’s website) . But graphic clock always low(67MHz) even if my deep learning task is running on GPU. I think this issue come from high power draw. But I couldn’t find solution. Details are following.

During deep learning task is running, GPU’s “performance” is following.

$ nvidia-smi -q -d performance

==============NVSMI LOG==============

Timestamp : Tue May 16 10:28:58 2017
Driver Version : 375.66

Attached GPUs : 1
GPU 0000:01:00.0
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Active
Sync Boost : Not Active
Unknown : Not Active

And this is a part of “nvidia-smi -q” ouput

Power Readings
    Power Management            : N/A
    Power Draw                  : 465.84 W
    Power Limit                 : N/A
    Default Power Limit         : N/A
    Enforced Power Limit        : N/A
    Min Power Limit             : N/A
    Max Power Limit             : N/A

Power draw is too high because AC adapter have a 240W capacity. Power draw is always high compare with providing power. This is reason why “SW Power Cap” is active. Hence, GPU’s clock become very low(67MHz) even if “heavy” task is running on GPU.

But why power draw is alway reached high? I’m using wrong driver?

Cheers,
Kimiaki

P.S.

Full output of " nvidia-smi -q" is following.

$ nvidia-smi -q
==============NVSMI LOG==============

Timestamp : Tue May 16 15:54:06 2017
Driver Version : 375.66

Attached GPUs : 1
GPU 0000:01:00.0
Product Name : GeForce GTX 1070
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-c0bfb5d3-6df3-fa0b-329b-afd4bb6decc5
Minor Number : 0
VBIOS Version : 86.04.54.00.09
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x1BE110DE
Bus Id : 0000:01:00.0
Sub System Id : 0x07C01028
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 3000 KB/s
Rx Throughput : 7000 KB/s
Fan Speed : N/A
Performance State : P2
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Active
Sync Boost : Not Active
Unknown : Not Active
FB Memory Usage
Total : 8108 MiB
Used : 3325 MiB
Free : 4783 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 100 %
Memory : 7 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0 ms
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 53 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 94 C
Power Readings
Power Management : N/A
Power Draw : 476.04 W
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 69 MHz
SM : 69 MHz
Memory : 3802 MHz
Video : 658 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 4004 MHz
Video : 1708 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1166
Type : G
Name : /usr/bin/X
Used GPU Memory : 41 MiB
Process ID : 2656
Type : G
Name : nvidia-settings
Used GPU Memory : 0 MiB
Process ID : 7537
Type : C
Name : ./darknet
Used GPU Memory : 3279 MiB

P.S. again

I checked power draw of my machine. I used powerstat command. When GPU is running to train my deep learning network, power draw which was reported by powerstat is about 60W. On the other hand, my machine spent 33W when deep learning task is NOT running on GPU. powerstat command can only run without power cable. So I plugged out power cable from my laptop and run powerstat. It is clear that power draw value which was reported by nvidia-smi is too high.

This is on a laptop?

Did the laptop originally ship to you with windows installed?

My machine is laptop(alienware 15 R3) and of cause ship with windows. But I installed ubuntu14.04 into whole space of SSD . Windows was deleted.

I suspect reported power draw values and clock rates to be wrong. But you’re already on the latest driver. That’s strange.

See if Alienware offers a BIOS upgrade for the machine.

I’m using a driver 375.66 which was dowloaded from nvidia’s web site. It was released at 2017.5.4. I think this is latest one as you said. But I missed about BIOS. I will check that.

http://www.nvidia.com/download/driverResults.aspx/118290/en-us

BIOS version is now 1.0.8(released at 28 Nov 2016) but new version was released at 24 Apr 2017.

new version:
http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=FTKFN&fileId=3673656917&osCode=WT64A&productCode=alienware-15-laptop&languageCode=en&categoryId=BI

v1.0.8
http://www.dell.com/support/home/us/en/04/drivers/DriversDetails?productCode=alienware-15-laptop&driverId=FKHH1

I will try to update BIOS. Thanks.

I updated system BIOS to new version(1.0.9).But power draw is still high(475W) and GPU graphic clock is still 67MHz. CUDA and nvidia driver ware NOT re-installed.

I checked output of dmesg. But it was not helpful.

$ dmesg | grep nvidia
[ 1.898530] nvidia: module license ‘NVIDIA’ taints kernel.
[ 1.903720] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 1.909388] nvidia-nvlink: Nvlink Core is being initialized, major device number 251
[ 1.910429] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.66 Mon May 1 14:33:30 PDT 2017
[ 1.911100] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 2.742641] nvidia 0000:01:00.0: irq 150 for MSI/MSI-X
[ 3.713886] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 249
[ 3.722601] nvidia-modeset: Allocated GPU:0 (GPU-c0bfb5d3-6df3-fa0b-329b-afd4bb6decc5) @ PCI:0000:01:00.0

I gave up. I found a instruction for Installing ubuntu16 into Alienware 13. So I install ubuntu16.04(now using 14.04).

https://github.com/andrewwakeling/alienware-13-r3-ubuntu-16.04
https://orech.github.io/ubuntu/install/gtx1070/nvidia/cuda/gtx/pascal/linux/2017/01/22/welcome-to-jekyll.html

As cbuchner1 says, the reported power draw of 475 watts for a GTX 1070 at power state P2 is impossibly high. So something isn’t right here, but I can’t tell what it could be. It sounds like you are using NVIDIA’s generic driver. These drivers may not work with every laptop; some laptops require special modified drivers provided via the laptop vendor. No idea whether that applies to your specific machine.