I looked into this a bit to track down the history of what happened. Apparently the power measurement circuitry on this particular GPU board isn’t very accurate, so reporting was disabled in nvidia-smi. However, there was a bug in the way it was disabled that causes it to report “ERR!” instead of “N/A”. A future driver should change the reporting to “N/A”.
So the 35 and 36 readings I am seeing in 384.90 are not correct? It fluctuates between those two values depending on GPU utilization load.
And yes, the “ERR!” was what scared me.
Apart from that, is this something safe to just ignore or are there any side effects?
Thank you for the reply.
Is the temperature sensor support guaranteed to stay in future driver versions for this card?
It would be disastrous to lose that one as it affects power management.
Sorry for the odd question, but I did pay 203 US dollars (including taxes) to get correctly imported NVIDIA card so I would like to know what will continue to work on the long run.
Thank you.
To me it looks like power limit handling has been broken too. I recently upgraded from 384.90 and since then my miner can’t stress my GTX 1070 more that ~ 102W - no matter how high I set the power limit :-( In the last version of the driver this was definitely possible. (of course I did not do any changes in the miner config)
BTW. this is only true for the GTX1070 card, the GTX1050Ti still drains all the power it should (can confirm this from the temperature readings, as the actual power reading shows ERR).
EDIT: should I start a new thread with that topic ?
I don’t know. But if I understand from aplattner’s post that the readings are incorrect, maybe NVIDIA is capping power usage to safe values so the cards don’t blow up?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.12 Driver Version: 387.12 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:01:00.0 Off | N/A |
| 41% 60C P2 100W / 105W | 605MiB / 8113MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 105... On | 00000000:06:00.0 Off | N/A |
| 54% 69C P0 ERR! / 52W | 2295MiB / 4038MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12906 C ./miner 595MiB |
| 1 12852 C ./ethdcrminer64 2285MiB |
+-----------------------------------------------------------------------------+
The 1070 uses only 100-102W, no matter which value I enter for CAP. I also have a power meter on the outlet for this machine, and the readings are correlating, also temp of card does not rise, when setting power to e.h. 150W, which definitely did in previous versions.
nvidia-smi says P0 on my 1050 ti while nvidia-settings says P2.
Now it says P5 and is stuck there regardless of clock speed according to nvidia-smi -q.
Edit: And now it went back to P0. There seems to be some delay.
Just downgraded to 384.90 and all is good again. Both reading of power usage @ GTX1050Ti and actual power usage/power limit is working again.
People that are using their card for mining should definitely stick to 384.90 ! zecminer ~ 370 Sol/s @ 387.12 @ ~ 102W (not more possible) and ~ 405 Sol/s @ 384.90 @ 110W
BTW power state of 1070 is still P2, but raising power limit also raises power usage. Nvidia must have broken something with the latest driver version :-(
ed Oct 18 23:16:54 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:01:00.0 Off | N/A |
| 42% 61C P2 108W / 110W | 605MiB / 8114MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 105... On | 00000000:06:00.0 Off | N/A |
| 53% 69C P0 52W / 52W | 2301MiB / 4038MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 32344 C ./miner 595MiB |
| 1 32375 C ./ethdcrminer64 2291MiB |
+-----------------------------------------------------------------------------+
I don’t do mining and my load is relatively low so I don’t care for the power cap.
In any case, I can’t downgrade to 384.90 because 287.12 works much better with the Gnome/mutter monitor-manager changes in gnome 3.26.
I would be happy marking this thread as ‘Fixed’ if they simply changed the “Err” and “unknown error” messages to “N/A” assuming there are no other side effects of doing so.