nvidia-smi power limit on GTX 1060

Just tip for mods:

There are numerios threads with last activity “48 years ago” and no one looks it. Default sorting is by activity and lots of never see light of day… Please change something …

Regards,
Luka

Hi luv,
Can you share url showing specs of the cards you are using? Is the issue only hit if you have two or more cards installed in system? Can I get nvidia bug report as soon as issue hit? What the the difference in behavior in case of monitor is connected and disconnected to GPU?

>>it’s always the first card listed in nvidia-smi (lowest pci id?) that is affected by this.
Can you explain more about it?

Hi flair666, xrandr --verbose output is missing in your log. Can you share? What to see monitor needs to be connected/disconnected to GPUs to reproduce this issue?

Sure … do you want me to reverse engineer and fix the code as well?

It’s a pascal card (GTX 1060). I don’t remember the result with one card installed only. Please read my and qxsnap’s comment again and try to reproduce - if you get stuck feel free to ask for help.

Thanks!

Hi luv, Can I get nvidia bug report as soon as issue hit? Also issue reproduction steps are not so clear. I think you tried multiple stop X and start X. How did you ran ethminer ? Any specific command line ?

Hi

ethminer parameters are irrelavant, you can use any intensive enough cuda process.

To reproduce: install two 1060 gtx cards and make sure there are not other cards installed/enablee (not even an integrated GPU) and do the following:

  • run any cuda process able to utilize the card to the max and keep it running
  • startx, and wait a bit
  • stopx
  • note that the cuda process is now very ineffective but that’s a different bug

  • startx
  • note the power consumption of the first card

I have exactly the same problem with Palit GTX 1060 StormX / Dual card. Ubuntu 16.05 LTS, driver ver 381.22. Whereas, the MSI GTX 1060 card obeys power limit perfectly.

Has there been any fix for this?

Hi jbase,

I have noticed nvidia does have actually 2 (un?)related problems here:

  1. The one described in my and luka1002 comments and it affects gtx1060 cards from all manufactures (always affects only in the first card). Try qxsnap’s workaround - works for me.

  2. Some gtx1060 cards do not respect power limits AT ALL (for example cards from Gainward). I’d suggest returning the card.

Hope it helps.

Actually, Gainward is owned by Palit, so. it’s the same corporation. ))

Power problem on GPU 0 in Linux is because of display that is not attached to GPU0. If you have no plans to do it you need to tell xorg to use card, and fake xorg as display is attached:

nvidia-xconfig --allow-empty-initial-configuration

It affects also 1070. (i think any nvidia 10xx VGA)

luka1002, in fact, nvidia-smi runs on the 16.04 server distro that has no graphics shell at all.

nvidia-smi does not support setting the fan speed or clock offsets

Yes, it does. So,you can not set nvidia-xconfig because you do not have xorg? Well i did not try to use distro without xorg because i need overclocking, and nvidia-smi can overclock only Tesla & Quadro. Other need to use coolbits in xorg.conf and for that you need xorg. Really stupid but …it is only way.
In other topic i already wrote about weird dependencies on nvidia drvier in linux.

Only you could try to connect monitor in GPU0 and reboot. Then try power limit.

If you start xorg (and power limit works) and stop it (and then it does not), nvidia could not use xorg.conf and with that can not dectect monitor anymore. If that is what happening than it confirm crazy behaviour.

IF works with monitor attached you could use dvi-vga cable and no monitor. (CRT)

Can you please check when on full run, does your nvidia stays in P2 mode or goes in P3? (nvidia-smi)

luka1002, I just mean that nvidia-smi can set power limit w/o any interaction with Xorg or any graphics shell at all.

My GTX 1060s are ALWAYS in P2 state and never change it. Which is another weird thing that I don’t understand. For GTX970/980 I need to put them into P0 state and that boosts them a bit, but it looks like states work differently on GTX 1060.

What i want to say is i know that you can set PL without xorg, but problem with power limit i was not able to solve without xorg. (and with this “solution” it works in strange way, but it works)

Seems that nvidia use xorg.conf to setup monitor setup.

Thank you for test P2 state. It confirm anther bug that is subject to another thread.

Now, the funny thing, is that my cards do not exactly disobey the limit. )) If I set it to 90 watts, they play around 100-105 watts. But if I set it to 60 watts (the minimum), they do not go over 90 watts.

Like, they honor my request, but they also use intelligence of their own, lol.

I believe I have similar problem.
I have two Asus 1060 GPUs and two PNY’s 1060 (all 3GB) and while two Asus ones obey the limit, PNY acts similar to jbase problem. Except that it doesn’t matter at all if I set limit to 60 or 140, they’re still around 90W. Upgraded to 384.47, but that didn’t change anything.

nvidia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.47                 Driver Version: 384.47                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  On   | 00000000:02:00.0  On |                  N/A |
| 50%   68C    P2    90W /  60W |   2199MiB /  3010MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 106...  On   | 00000000:04:00.0 Off |                  N/A |
| 32%   57C    P2    74W /  75W |   2207MiB /  3013MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 106...  On   | 00000000:05:00.0 Off |                  N/A |
| 50%   69C    P2    92W /  60W |   2207MiB /  3013MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 106...  On   | 00000000:07:00.0 Off |                  N/A |
| 37%   61C    P2    74W /  75W |   2207MiB /  3013MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

My guess is since both PNY and Palit are lower end cards, they may have no core step-down voltage controller at all, so, it’s physically impossible to control power consumption in wide range.

Wrong, as it’s possible on Windows and it actually works there.

flair666, you’re probably right here… Have you tried observing core/mem clocks for both your MSI and PNY cards when you try to limit power usage? Do they do it in a different way?