nvidia-smi power limit on GTX 1060

We have tested multiple cards by running cuda intensive applications like cuda-memtest, gpu-burn-test , cida-z , ethereum . We observed power will surge/spike for 1W or 2W and again come down to limit. Looks like to issue specific to some particular Vendors. Can you get in touch with GPU Vendors?

From this response it seems like NVIDIA support does not understand the problem - only one card in multiple GPU setup is not obeying the power limit.

NVIDIA should have a linux mining division where each card would be tested in most common mining workflows, including overclocking memory and downclocking core and power limit.

So that customers don’t have to start threads like.

prezimi: I think sandipt is just trolling :P

Sandipt, previous posters made it quite clear the problem is specific to Linux only. On Windows power limit does work for GPUs in question.

It is not card Vendor problem it is driver problem.
I have tested 2x Gigabyte 1060 6GB, 8x Msi 1070 Armor they all have this problem.
Problem is not present(in my case) if you connect monitor to GPU0.
What i also noticed that sometimes it works normal, but after reboot it happends again. Try multiple reboots.

Permanent solution is to connect monitor to GPU0 (for distro without Xorg) or to use

sudo nvidia-xconfig -enable-all-gpus -cool-bits=28 <b>--allow-empty-initial-configuration</b>

for Xorg version distro. Coolbits are not important, important is --allow-empty-initial-configuration .

But still i think it is BUG.

Hi All, Can you please test with latest 384.59 and 375.82 ? We still struggling to repro this issue by running some mining apps.

Hi Sandip,

Just tried 384.59, and same behaviour. Limit set to 75W but draw ~90W.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.59                 Driver Version: 384.59                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  On   | 00000000:01:00.0  On |                  N/A |
| 55%   62C    P2    <u><b>90W /  75W</b></u> |   2254MiB /  3010MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  On   | 00000000:02:00.0 Off |                  N/A |
| 40%   47C    P2    76W /  75W |   2216MiB /  3013MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 106...  On   | 00000000:05:00.0 Off |                  N/A |
| 55%   55C    P2    76W /  75W |   2216MiB /  3013MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 106...  On   | 00000000:08:00.0 Off |                  N/A |
| 55%   63C    P2    <b><u>92W /  75W</u></b> |   2216MiB /  3013MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 1070    On   | 00000000:09:00.0 Off |                  N/A |
| 50%   53C    P2    95W /  95W |   2246MiB /  8114MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

Of course it does not get fixed magically by updating to the latest drivers … someone actually has to fix it in the drivers/vbios - no point trying newer drivers until nvidia actually announces it has been fixed.

I’m facing the same issues. I have 3 rigs with 36 GTX1060 GPUs. 12 cards per rig. The rig with Palit GPUs simply doesn’t allow me to change power settings. I’m using ethOS and no matter what ‘pwr’ value I put in my global config file, it simply doesn’t work. If I check the ethos-overclock log, I keep getting a message saying ‘Failed to set power management limit for GPU 00000000:01:00.0: Insufficient Permissions’. My remaining 24 MSI GTX160s are all down to 70-80w per card while the Palit cards are chewing up around 100w per card at even lower overclock settings. Here is a screenshot of the logs - [url]https://pasteboard.co/GEq8fd4.png[/url] and a screenshot that shows the power consumption - [url]https://us.v-cdn.net/5021640/uploads/editor/tx/bnjt3kcfh3n9.png[/url]. I’m basically wasting almost 500w on that specific rig. What a shame. Are there any fixes yet?

Hello all,

I have a rig with 10 GTX 1060 and I can lower the power to around 75W and the real draw only goes slightly above that value (ex: 78W instead of 75). I’m using the latest drivers. I have three different types of cards in my rig (1x single fan MSI, 6x dual fan MSI, 3 single fan EVGA), I had to set different power limits between 70 and 80W depending on which cards I was configuring.

The whole rig (motherboard, ssd, gpu, …) takes something like 890W out of the wall so if you remove power everything but GPU, I probably have an average of 75W per GPU.

If you want I can post links to my cards on Amazon where I bough them.

Thanks
Laurent

Same issue here with PNY GTX 1060.

Setting power limit to 60 and it still draws almost 100W.

It works fine with afterburner, I can’t believe a 3rd party tool like AfterBurner managed to get it right when NVIDIA own tool such as nvidia-smi is missing a call to commit the new power limit.

Please fix it ASAP as power managemend is a must in mining community.

Just a wild guess, but it looks like it behaves in percents, on linux, on windows it works as expected. Does setting pl to 59 give you 70w?

Not really…
Thing is, problem is only GTX 1060 related. GTX 970/980 cards obey the limit as intended.

Yes I know, it just looks like, maybe, for some 1060 cards it behaves in percents, while for others correctly in W.

Also maybe interesting, behavior on windows - nvidia-smi reports correct usage when PL is set, but other SW does not, something fishy here - it is for the same card where PL behaves badly (90w on 70w) in linux too.

So more people have issues with their GTX 1060 and nvidia-smi, yet nvidia does… nothing? Any news on this? @sandipt

Yes, it’s annoying … I’m seriously considering building new rigs on AMD+Windows instead of NVIDIA+Linux.

Yes same. Honestly I’m losing money because of this

Got an update for you guys!

Today I bought a new Palit 1060 StormX card and added it to a rig which was running 4 GPUs of same type, purchased earlier, last summer. The newer card is exactly the same, Samsung memory, but apparently a bit different (HC25 vs HC28) letters that can be seen thru radiators. Don’t know if it’s meaningful.

When I first ran the rig, I noticed older cards were giving me 23.5 mhs on ETHash (as usual), but the new card produced only 19.5 mhs. I ran nvidia-smi and noticed, that the new card was actually obeying the power limit! which I had set to 60 Wt (which translates to 90 Wt for older cards). So, I raised PL for the new card and it gave me 23.5 mhs. I then started experimenting with PL and found out, that 70 Wt is enough for this card to produce 23.5 mhs!

Which basically means, that we are being robbed of ~100 Wt per rig, because we cannot force the older cards to run below 90 Wt! And it’s pretty much the same for Equihash algo too.

I really would like Nvidia to comment on this. Is it newer firmware? How can I upgrade firmware on my older cards? Or is it different memory type? The newer card looks exactly the same as the old one. Type is Palit GTX 1060 3GB, NE51060015F9-1061F.

What regarding nvidia-smi and setting clock & memory (without X Window system need to be installed ?)
It’s really not nessesery to install whole X window on ring just for tunning - like pure debian serwer etc

What Nvidia plis say… just let us tune without graphics (nvidia-settings)

p.s.
and additionally… nvidia-smi speed… (13x1060 gpu ring, full load - eth mining)

#time nvidia-smi -L
real 0m27,081s
user 0m0,000s
sys 0m0,080s

#time nvidia-smi -q -d TEMPERATURE
real 0m50,362s
user 0m0,004s
sys 0m0,156s

Creazzy slooow
Why ?

Yes, system load is

#top -b -n 1| head -3
top - 02:46:32 up 1 day, 23:30, 4 users, load average: 15,92, 14,56, 13,65
Tasks: 210 total, 4 running, 206 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1,0 us, 54,7 sy, 0,0 ni, 43,5 id, 0,0 wa, 0,0 hi, 0,7 si, 0,0 st

but system is really responsable - it’s not cpu usage - it’s system usage