Painfully long driver initialization with many GPUs -- affects ALL drivers (Nvidia, please do someth...

EDIT 2018-09-01: This bug was fixed in 390.x but then was reintroduced in 396.x Beta :( … if you suffer from this bug, stick to 390.x (I’m using 390.48).

Ubuntu 16.04 here, tried many kernels too, but it happened with other distros. I’m taking the time to post this after many months of frustration.

I have motherboards with as many as 13 GPUs but also have mobos with 7 or 8 GPUs.

The nvidia driver seems to load the GPUs sequentially and it adds an exponentially increasing delay between each card. For 13 GPUs it used to take about 40 seconds on the v384 drivers and older, which is already very long! Now with the release of v387 and 390 drivers it takes a whopping 81 seconds!

Here’s a kernel log:

Jan 21 07:51:52 m8 kernel: [197026.537238] nvidia-modeset: Allocated GPU:0 (GPU-177a47dc-f3f8-b480-0f2a-e223c6874e91) @ PCI:0000:01:00.0
Jan 21 07:51:53 m8 kernel: [197027.475759] nvidia-modeset: Allocated GPU:1 (GPU-943a5e69-78c3-51a5-c0ed-d7f314655bab) @ PCI:0000:02:00.0
Jan 21 07:51:54 m8 kernel: [197028.379958] nvidia-modeset: Allocated GPU:2 (GPU-fb4e4bae-5192-590f-3151-b793f2aaaec6) @ PCI:0000:03:00.0
Jan 21 07:51:55 m8 kernel: [197029.338908] nvidia-modeset: Allocated GPU:3 (GPU-e7a64c57-e302-52d7-908a-3c10b3d33544) @ PCI:0000:04:00.0
Jan 21 07:51:56 m8 kernel: [197030.292160] nvidia-modeset: Allocated GPU:4 (GPU-af125524-a38d-aa30-849a-210db7f73f2f) @ PCI:0000:05:00.0
Jan 21 07:51:57 m8 kernel: [197031.325380] nvidia-modeset: Allocated GPU:5 (GPU-8b7a4614-8f29-9c2f-0f1c-aa15d87bb934) @ PCI:0000:06:00.0
Jan 21 07:51:59 m8 kernel: [197032.623103] nvidia-modeset: Allocated GPU:6 (GPU-5e72c389-98bc-9af3-c875-dda1baa09120) @ PCI:0000:09:00.0
Jan 21 07:52:00 m8 kernel: [197034.333221] nvidia-modeset: Allocated GPU:7 (GPU-095a5f5c-a8a9-f5a3-d55a-0153f8ed9e1f) @ PCI:0000:0a:00.0
Jan 21 07:52:03 m8 kernel: [197036.970865] nvidia-modeset: Allocated GPU:8 (GPU-d2c07689-13c9-fd1f-d3b3-f1c52e419114) @ PCI:0000:0b:00.0
Jan 21 07:52:08 m8 kernel: [197041.947771] nvidia-modeset: Allocated GPU:9 (GPU-b77a08ed-2317-f267-7555-cb2fcce58f81) @ PCI:0000:0c:00.0
Jan 21 07:52:17 m8 kernel: [197051.463534] nvidia-modeset: Allocated GPU:10 (GPU-4fd72b20-d503-e310-827c-3e5b5873e162) @ PCI:0000:0d:00.0
Jan 21 07:52:36 m8 kernel: [197070.371853] nvidia-modeset: Allocated GPU:11 (GPU-3143f61f-0f5d-1fde-17bf-9949bc461857) @ PCI:0000:0e:00.0
Jan 21 07:53:13 m8 kernel: [197107.221568] nvidia-modeset: Allocated GPU:12 (GPU-335ea0b6-06b6-29b7-4a67-da5d47324df7) @ PCI:0000:0f:00.0

Note the increasing delay between allocating each GPU: 1,1,1,1,1,2,3,5,9,19,37 seconds … 81 seconds between allocating GPU0 and GPU12.

This is very painful as you can’t do anything with any of the GPUs until all of them are loaded. You must wait those 80 seconds all the time: X is delayed, nvidia-smi is delayed, etc. What’s worse is that if you try to launch any gpu app while the driver is loading the GPUs, then it takes double the amount of time as the reinitialization occurs again separately for the gpu app which doesn’t see the driver initialized and reinitializes it again.

nvidia-persistenced doesn’t help … it just triggers the same process which takes the same amount of time (while all other GPU apps are blocked). The machine boots in <10 seconds but then I have to wait a 1.5 minutes until doing anything GPU related, including starting X.

Can Nvidia devs PLEASE do something about this? I was hoping newer drivers to fix this, but it made it worse!

Also, could the driver allocate the GPUs in parallel, and without delays?



You’re unlikely to receive an answer without providing the nvidia-bug-report.log

I added the report generated by at the end of the 1st post.

Bump? Can a very kind Nvidia dev look at this? I’ll provide beer/chocolate/whatever would keep you going.

I wonder if it’s possible to have multiple copies of the driver running at the same time, and loading one GPU per driver using the NVReg_AssignGpus module option … I’m not an expert though, could someone shad some light whether this is possible, and how? Would it need inserting multiple kernel modules, different in name but otherwise identical? How? A example I could try would be great. I’ll experiment when I get to the machine.

This is being looked into and there are some changes going in to mitigate at least some of the delay. It’s being tracked in bug 2010268.

Hi Aaron, that’s good to hear, thanks! Note that 387 and 390 drivers have doubled the waiting time from 40 seconds to 81. Halving it back to 40 means it’ll be as bad as it used to be, but 40 sec is still way too much.

The driver has an exponential delay when allocating a new GPU: it doubles the delay with every new allocation (you can time this), the first post shows 1,3,5,9,19,37 seconds for the last 6 out of 13 GPUs.

Could you advise on my question just above? multiple drivers and assigning a single gpu per driver? I found docs about it in the old 331 notes, but not in 384, 387 or 390. However, the NVReg_AssignGpu option is still there in 390. I’d be grateful for some guidance.

I also remember the installer option to generate enumerated driver copies for that kind of use case, seems to be gone in newer drivers or left undocumented.

Indeed, but the NVReg_AssignGPUs module option is still there (try “modinfo nvidia”). I would very much want the ability to restart a single GPU without affecting the others. Right now nvidia-smi requires stopping all gpu apps on all GPUs… it’s completely overkill to have to stop 8+ GPUs and lose all the work so far just because one of the crapped out for some reason. The NVReg_AssignGPUs module option was looking promising. Can anyone from Nvidia please assist?

I’ve taken a look at older driver version, and I think this was removed because multiple drivers breaks uma breaks cuda so useless.

Did you mean to say NUMA or CUDA? Any link to the page explaining why it was removed?

I don’t care about NUMA for my use case. I don’t see why it would break CUDA. CUDA would see the GPUs that the respective driver exposes, which is controlled by the NVReg_AssignGpus module option. I might be missing something.

nVidia UMA - Unified Memory Access:
I simply downloaded an old 331 driver and started it with option -A, there was the old multiple-drivers option and the explanation stated it doesn’t work with UMA. Newer CUDA versions rely on this afaik, so the option was removed, I suspect, as it’s useless now.

Thanks. Still sounds a bit speculative though - any links where it’s stated that newer CUDA 8/9 require UMA? 331 drivers worked with CUDA 6, when UMA was available.

The driver option NVReg_AssignGPUs is still there, suggesting it’s still possible to assign a subset of GPUs to a driver.

Perhaps someone from Nvidia can confirm?

NVReg_AssignGPU is also there for other use cases like pci pt.

Hi @aplattner - any news on this? You mentioned it’s tracked in bug 2010268. Is that something we can see?

@aplattner Ok, this WAS fixed in 390.x but then you guys re-introduced this bug in 396.x and it’s now even worse … up to 3 full minutes! 396.x is still beta, hopefully you’ll fix it again.

I can confirm that this might be the driver version :)
Please fix it

Could you confirm exactly which 396.x you are referring to and also attach nvidia-bug-report.log.gz?

We tested both drivers and implemented it in flashable image (using program)

Both have very slow response for nvidia-settings and nvidia-smi while computing is taking place.
We tested this with GTX1070 and GTX1060, the same reaction.
So the 4.17.19 kernel and 396.54 or 390.87 have the same problem.

On kernel 4.13.10 + nvidia 387.22 it is working fine (the same PC with the same GPUs)

If you want, you can download our Operating System Image based on ubuntu and with our precompiled kernel with nvidia and amd driver.
Thanks for answering :)