Help with GTX 1060 3gb Ubuntu Linux (using too much power)

I originally posted this in the nvidia.geforce.com community and they referred me to here. So will post the question/problem I am having below. Any help will be greatly appreciated.

Ok I feel there is something wrong with this card that I’ve picked up off Ebay. It is a GTX 1060 3gb 03G-P4-6162-KR.

I have installed this on my Linux box replacing the GTX 750ti that I originally had in it. It’ll fire up just fine however the power usage is what putting me in panic mode :D The card is using 394 watts! How in the hell. When 120 is the max it should use. I am not sure if this is some driver issues in Linux? Or if there is something wrong with the card. How can I test, check or any other steps that I can do? I have included the NVIDIA-SMI and Power stats below on what I am seeing from this card. I have tested a 1050 ti and a 1070 on this box and those worked perfectly and btw only tested those after seeing these results with the 1060. This 1060 3gb card on the other hand, well stats below. Any help is appreciated.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73 Driver Version: 410.73 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+================= =====+======================|
| 0 GeForce GTX 106… Off | 00000000:01:00.0 On | N/A |
| 57% 29C P5 394W / 120W | 273MiB / 3018MiB | 4% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|================================================= ============================|
| 0 1140 G /usr/lib/xorg/Xorg 18MiB |
| 0 1174 G /usr/bin/gnome-shell 49MiB |
| 0 1344 G /usr/lib/xorg/Xorg 94MiB |
| 0 1478 G /usr/bin/gnome-shell 107MiB |
| 0 1822 G /usr/bin/nvidia-settings 0MiB |
±----------------------------------------------------------------------------+

And here is the power listing

==============NVSMI LOG==============

Timestamp : Mon Nov 19 13:27:03 2018
Driver Version : 410.73
CUDA Version : 10.0

Attached GPUs : 1
GPU 00000000:01:00.0
Power Readings
Power Management : Supported
Power Draw : 393.40 W
Power Limit : 120.00 W
Default Power Limit : 120.00 W
Enforced Power Limit : 120.00 W
Min Power Limit : 60.00 W
Max Power Limit : 140.00 W
Power Samples
Duration : 12.29 sec
Number of Samples : 119
Max : 394.46 W
Min : 392.17 W
Avg : 393.88 W

Since it’s staying at 29°C, this is just a bogus display or a mighty powerful cooling system. Either a driver bug or a defective sensor. Try reverting to the 390 driver to see if the values change.

Ok thanks I will give that a try. I figured it has to be driver related as the temp has stayed cool.

Ok still having same results. Below is the steps that I preformed

1: Sudo apt-get purge nvidia*
2: reboot
3: ubuntu-drivers devices

Got the following information:

== /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C02sv00003842sd00006162bc03sc00i00
vendor : NVIDIA Corporation
model : GP106 [GeForce GTX 1060 3GB]
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-415 - third-party free recommended
driver : nvidia-driver-396 - third-party free
driver : nvidia-driver-390 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin

Since originally I was on 410. I did a sudo ubuntu-drivers autoinstall so that the latest got installed (415) rebooted. nvidia-smi and same power issue.

So after repeating above steps

4: sudo apt install nvidia-390
5: reboot system

6: nvidia-smi here is the stats again:

Mon Nov 19 15:14:49 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106… Off | 00000000:01:00.0 On | N/A |
| 50% 27C P8 393W / 120W | 301MiB / 3018MiB | 1% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2857 G /usr/lib/xorg/Xorg 18MiB |
| 0 2895 G /usr/bin/gnome-shell 49MiB |
| 0 3067 G /usr/lib/xorg/Xorg 107MiB |
| 0 3200 G /usr/bin/gnome-shell 121MiB |
| 0 3983 G /usr/lib/firefox/firefox 2MiB |
±----------------------------------------------------------------------------+

ran command:==============NVSMI LOG==============

Timestamp : Mon Nov 19 15:15:57 2018
Driver Version : 390.87

Attached GPUs : 1
GPU 00000000:01:00.0
Power Readings
Power Management : Supported
Power Draw : 393.74 W
Power Limit : 120.00 W
Default Power Limit : 120.00 W
Enforced Power Limit : 120.00 W
Min Power Limit : 60.00 W
Max Power Limit : 140.00 W
Power Samples
Duration : 16.18 sec
Number of Samples : 119
Max : 394.37 W
Min : 392.25 W
Avg : 393.73 W
nvidia-smi -i 0 -q -d POWER

I don’t know what else to do. I believe the Cuda libraries are 10.0 but not sure if that’s the problem or not.

Looks like a defective sensor then. Don’t know if it has any impact besides the irritating values.

I was going to use this card on crypto however no success there. A coin that I was using the 750ti pulled 7 sol at a time. 1050ti was 16 sol. However this 1060 is doing only 1 sol at a time. Something is wrong just not sure where. I’ll probably take it out, and put it in my windows 10 machine and run some 3dmark benchmarks to fully test the card. See if I can isolate it to defected card, or some driver issue within linux.

Thanks for the help. Will post results up after while.

I probably would have saved myself a whole lot of headache had I just tested this out in windows first instead of linux :D The card would not allow windows to boot. Kept locking up. On 2nd reboot with card (power isn’t an issue, EVGA 750 G+ PSU) it even disabled my CPU fan and RGB lighting (although RGB lighting on RAM still worked). So I took the card out, put my 1070 back in, rebooted and thankfully fan, lights and all was well again. so I will be returning this card back to the seller. I believe as you suggested there is possibly a sensor or chip that is defective.

Thanks for the help