Painfully long driver initialization with many GPUs -- affects ALL drivers (Nvidia, please do someth...

Also i have access to one PC which has this problem so if it is possible i can show this problem via SSH console.

OK i have 2 computers on which this is working badly.
I prepared bug reports.

http://paste.ubuntu.com/p/dw4dN8kt2Q/
nvidia-smi takes about 12 seconds
adjusting all fans takes like 4+ minutes :)

root@simpleminer:/home/miner# time nvidia-smi                                                                                                 
Mon Sep 10 15:44:33 2018                                                                                                                      
+-----------------------------------------------------------------------------+                                                               
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |                                                               
|-------------------------------+----------------------+----------------------+                                                               
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                               
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |                                                               
|===============================+======================+======================|                                                               
|   0  GeForce GTX 107...  On   | 00000000:01:00.0 Off |                  N/A |                                                               
| 84%   72C    P2   171W / 225W |    197MiB /  8119MiB |     68%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   1  GeForce GTX 107...  On   | 00000000:02:00.0 Off |                  N/A |                                                               
| 86%   70C    P2   151W / 170W |    197MiB /  8119MiB |     97%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   2  GeForce GTX 107...  On   | 00000000:03:00.0 Off |                  N/A |                                                               
| 84%   70C    P2   198W / 180W |    197MiB /  8119MiB |     95%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   3  GeForce GTX 1080    On   | 00000000:04:00.0 Off |                  N/A |                                                               
| 95%   71C    P2   165W / 210W |    265MiB /  8119MiB |     99%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   4  GeForce GTX 1080    On   | 00000000:05:00.0 Off |                  N/A |                                                               
| 99%   70C    P2   164W / 210W |    265MiB /  8119MiB |     98%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   5  GeForce GTX 1080    On   | 00000000:06:00.0 Off |                  N/A |                                                               
| 80%   70C    P2   145W / 190W |    265MiB /  8119MiB |     99%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   6  GeForce GTX 1080    On   | 00000000:09:00.0 Off |                  N/A |                                                               
| 97%   70C    P2   164W / 210W |    265MiB /  8119MiB |    100%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   7  GeForce GTX 1080    On   | 00000000:0A:00.0 Off |                  N/A |                                                               
| 80%   64C    P2   160W / 190W |    265MiB /  8119MiB |     90%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               

+-----------------------------------------------------------------------------+                                                               
| Processes:                                                       GPU Memory |                                                               
|  GPU       PID   Type   Process name                             Usage      |                                                               
|=============================================================================|                                                               
|    0      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    0      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    179MiB |                                                               
|    1      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    1      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    179MiB |                                                               
|    2      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    2      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    179MiB |                                                               
|    3      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    3      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    247MiB |                                                               
|    4      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    4      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    247MiB |                                                               
|    5      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    5      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    247MiB |                                                               
|    6      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    6      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    247MiB |                                                               
|    7      1392      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    7      3406      C   /root/miner/z-enemy-v1.18-cuda9.2/z-enemy    247MiB |                                                               
+-----------------------------------------------------------------------------+                                                               
                                                                                                                                              
real    0m12.736s                                                                                                                             
user    0m0.000s                                                                                                                              
sys     0m0.791s

nvidia-smi (takes only 0,5 second)
changing fanspeed real 1m27.233s
http://paste.ubuntu.com/p/rDT9dBRN8K/

root@simpleminer:/home/miner# time nvidia-smi                                                                                                 
Mon Sep 10 15:51:31 2018                                                                                                                      
+-----------------------------------------------------------------------------+                                                               
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |                                                               
|-------------------------------+----------------------+----------------------+                                                               
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                               
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |                                                               
|===============================+======================+======================|                                                               
|   0  GeForce GTX 106...  On   | 00000000:01:00.0 Off |                  N/A |                                                               
| 66%   65C    P2    77W /  83W |    153MiB /  6078MiB |    100%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   1  GeForce GTX 106...  On   | 00000000:02:00.0 Off |                  N/A |                                                               
| 58%   63C    P2    62W /  83W |    153MiB /  6078MiB |    100%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   2  GeForce GTX 106...  On   | 00000000:03:00.0 Off |                  N/A |                                                               
| 57%   65C    P2    91W /  90W |    153MiB /  6078MiB |    100%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   3  GeForce GTX 106...  On   | 00000000:04:00.0 Off |                  N/A |                                                               
| 62%   66C    P2    84W /  90W |    153MiB /  6078MiB |    100%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   4  GeForce GTX 106...  On   | 00000000:06:00.0 Off |                  N/A |                                                               
| 69%   65C    P2    80W /  83W |    153MiB /  6078MiB |    100%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   5  GeForce GTX 106...  On   | 00000000:07:00.0 Off |                  N/A |                                                               
| 67%   65C    P2    80W /  83W |    153MiB /  6078MiB |     97%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   6  GeForce GTX 106...  On   | 00000000:08:00.0 Off |                  N/A |                                                               
| 64%   65C    P2    70W /  83W |    153MiB /  6078MiB |     97%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               
|   7  GeForce GTX 106...  On   | 00000000:09:00.0 Off |                  N/A |                                                               
| 53%   65C    P2    76W /  83W |    153MiB /  6078MiB |     97%      Default |                                                               
+-------------------------------+----------------------+----------------------+                                                               

+-----------------------------------------------------------------------------+                                                               
| Processes:                                                       GPU Memory |                                                               
|  GPU       PID   Type   Process name                             Usage      |                                                               
|=============================================================================|                                                               
|    0      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    0     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    1      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    1     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    2      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    2     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    3      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    3     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    4      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    4     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    5      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    5     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    6      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    6     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
|    7      1525      G   /usr/lib/xorg/Xorg                             5MiB |                                                               
|    7     31397      C   /root/miner/z-enemy-v1.18-cuda9.1/z-enemy    135MiB |                                                               
+-----------------------------------------------------------------------------+                                                               
                                                                                                                                              
real    0m0.457s                                                                                                                              
user    0m0.004s                                                                                                                              
sys     0m0.149s

OK i made some improvment in my fanspeed script.
It do not executes separate command to each gpu but it executes one nvidia-settings with multiple -a parameters.
Now it works much faster but still from my tests it seems that newer nvidia drivers are significantly slower in nvidia-settings/nvidia-smi commands.
Is there meaby something that we could adjust in kernel to be able to communicate with nvidia gpus faster ?
Meaby lowering CUDA computing prority over nvidia-smi requests ?

I even have some PC’s in which nvidia-smi takes:
real 1m46.630s

This also occurst in newest nvidia drivers 410.57 and 4.15.2 kernel :(

And also executing this on some computers take like 10+ minutes which is totally unacceptable :)

DISPLAY=:0 nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=70 -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=70 -a [gpu:2]/GPUFanControlState=1 -a
[fan:2]/GPUTargetFanSpeed=70 -a [gpu:3]/GPUFanControlState=1 -a [fan:3]/GPUTargetFanSpeed=70 -a [gpu:4]/GPUFanControlState=1 -a [fan:4]/GPUTargetFanSpeed=70 -a [gpu:5]/GPUFanControlState=1
-a [fan:5]/GPUTargetFanSpeed=70 -a [gpu:6]/GPUFanControlState=1 -a [fan:6]/GPUTargetFanSpeed=70 -a [gpu:7]/GPUFanControlState=1 -a [fan:7]/GPUTargetFanSpeed=70 -a [gpu:8]/GPUFanControlState=
1 -a [fan:8]/GPUTargetFanSpeed=70 -a [gpu:9]/GPUFanControlState=1 -a [fan:9]/GPUTargetFanSpeed=78 -a [gpu:10]/GPUFanControlState=1 -a [fan:10]/GPUTargetFanSpeed=70 -a [gpu:11]/GPUFanControlS
tate=1 -a [fan:11]/GPUTargetFanSpeed=70

This is how it looks like inside:

pyar ke pal lyrics

OK guys.
Thanks to user (filemissing) i was able to confirm which is casuing this problem.
The problem shows its signs bigger and bigges when we are using lower and lower powerlimit.
Here are my benchmarks on 4xx driver (but it does not matter what drivers it is on):
Setup is 12x p104 GPUs

powerlimit set to 180 watts (acceptable):
nvidia-smi : about 1 seocnd
nvidia-settings command that changes fanspeed on all gpus: about 8 seconds

powerlimit set to 150 watts (problems shows its signs):
nvidia-smi : about 4 seconds
nvidia-settings command that changes fanspeed on all gpus: about 90 seconds

powerlimit set to 200 watts(acceptable):
nvidia-smi : about 1 second
nvidia-settings command that changes fanspeed on all gpus: 7-16 seconds

powerlimit set to 120 watts (problem is so big that it is dangerous !):
nvidia-smi : 120++++ seconds
nvidia-settings command that changes fanspeed on all gpus: half an hour or more :)

Ofcourse NOT using powerlimit is not solution as we want to lower powerlimit on cards to have the same performance while less wattage. This is global problem on global scale that most of mining users are getting.
I reported this to nvidia as here noone is answering :(
Bug reported via email: Bug id 2415717 - Performance issue [Incident: 181005-000056]

We already lost like 30 gours on that subject.

Are there any updates on this subject?

I have contacted nvidia dev team two different ways but they do not give a …
Well since then few new driver versions were released, i am not sure if this fixed itself.
You can try our image from simplemining.net
Simplemining
This seems to be working and tle load average is like 2.5 while it is mining (computing)
SM-5.0.21-3e-a19.30-n430.64-v1255.img.xz

Hi tytanick,

Apologies for late reply.
I will try to replicate issue locally so that dev can investigate on it further.
But I will need repro steps and nvidia bug report for the same.
Also provide executable code/ application which triggers the problem.
It would be good to know minimum number of GPUs needed to repro issue.
Also share kernel config file if you have done any changes during kernel compilation.

We are tracking this issue in bug 2415717 [internal]. Please provide the information needed to reproduce this issue.

We have made fresh tests on the same setup as before and here are the results:

Kernel 5.0.21-4 driver nv430.40

  • 120W powerlimit - super fast responses
  • 100W powerlimit - fast response
  • 90, 80W powerlimit - we see some little slower response but it is still all ok

Kernel 4.17.19-14 driver n418.43

  • very slow repsponsesn ,totally unusable and 25+ load in system

Kernel 4.17.19-17 nv430.40

  • response work nice
  • at 80w powerlimit there is little slower but still all fine

It seems that 430.4 drivers solved this problem !

How many GPUs?
Which GPU models?
Which CPU?
How much RAM?

I tested 435 (not 430) drivers (almost all 435 versions) and they are indeed lighter on the CPU but gave measurable and consistent 2-3% performance loss on all compute tasks on GTX1070, 1070Ti, 1080Ti … that’s not a solution for me sadly. I keep the GTX1070 and 1070Ti at around 100W limit.

Would you have time to test drivers 415.27 with the 5.0 kernel and compare the actual benchmark you get with 430.40 and 5.0 kernel?

p.s. I don’t think the kernel makes a difference.

[quote=“”]

That is true, newest drivers are hashing 2-3% slower.
Kernel did not improved anything, only driver “fixed this” by slowing mining down ?

So i guess the case is still not solved as latest drivers work slower.

Our platform
G4400
13x Asus P104-100 4GB
H110 Pro BTC
8 GB ram

Thank for the experiments, please help to provide detailed information as per comment #31 in order to replicate issue locally for debugging.

The best way to do that in my opinion would be sharing access with my 13 GPU P104 Computer on which this problem exists.
Basically we need 13x P104 computer on which we have for example Kernel 4.17.19-14 driver n418.43 and when mining programm is running (computing process).
Can we do that in that way ? I can send SSH details to machine on which this is all set up.
You could do anything you want on that PC and test as much as you like.
But we would need to do this over email for security reasons.
Please tell me if that is good idea or not ?

@tytanick That’s a good idea and a generous offer

@amrits It would be great if you took up @tytanick’s offer.

To be honest, you can very easily see on any 1070 or 1070ti that 435 drivers are consistently 2-3% slower in all GPU compute apps in raw GPU performance, but at the same time lighter on the CPU – this is an undesired compromise. You should be able to reproduce that without any reports from us as it’s immediately visible, but it would be great if you took up @tytanick’s offer.

I can add one more thing.
I am the CEO of simplemining.net and we have multiple setups in our lab and thousands of clients using nvidia (amd too).
So we can easily spot bad things like that.

So there are few problems right now with nvidia driver:

  1. Problem with nvidia-smi (beeing very slow) with P104 mainly but on some other GPUs it happens too.
  2. The newest drivers are indeed much slower.

Because of those two things and also because we have may images ready to be tested i can offer thing like this:
ssh to machine, and then i can give You access to our panel to take a look how it looks like (how the computing speed is affected).
And i can also give you one command which will reflash automatically current OS on which you will be testing to different kernel/nvidia driver version.
This way you will have very easy 100% complete enviroment on which you can make tests on one kernel+driver and then reflash with one command to different image that has different version of kernel and nvidia driver.
This would allow nvidia team to test and see for theirselves what is going on.

On top of that there is one more issue with nvidia driver i would be willing to solve as it affects many people minign on GPUs cards.
3. The 2xxx GPU series have bug/problem with fans.
Previous versions of GPUs seen ONE software fan per ONE GPU card.
Now with the 2xxx (some series and also i think 1660 too.) have strange thing that nvidia-settings sees on most cards TWO fans per GPU.
The problem is that we are using our fanspeed managment script which is adjusting fanspeed on every GPU as user wants to as the default adjustments are just terrible. Anyway in setup like 1060, 1060, 2060, 2060, 2060 nvidia tells us for example that there is 8 system gpu fans total and we have no f…ng idea which gpu has how many system software fans…
Therefore we see only 8 fans per 5 gpus and we do not know whych has how many and therefore we cannot control fanspeed in gpus as we do not know if fan0 and fan1 are for one or for two cards or maybe fan0 is for GPU0 and then fan1 and fan2 is for GPU1…
As this bug exists for many days, we decided that we are turning off our fanspeed managment script for systems in which we see strange number of system GPU fans… If there is X GPUs then if there is X fans or 2X fans then we know how to proceed. but if there is any other number like 5 GPUs and 9 FANS then we are ignoring our script.
Also the second part of that problem is the fact that nvidia-smi also does not know which GPU has which FAN :)
For example we see on nvidia-smi outpus things like that:
GPU1 80% fan
GPU2 0% fan
GPU3 0% fan
GPU4 0% fan
GPU5 33% fan

This is because if card has two fans and one is set to 0% and the second to 50% then nvidia-smi can see this as 0% or as 50% … random …
This is bug for sure and i would appreciate to get those things fixed.

Last time i contacted nvidia support they ignored me.