kernel boot lockup with nvidia.ko 195.36.15


We have run into problems on one of our CUDA machines when upgrading the driver from 190.x to 195.36.15. The kernel locks up when loading the nvidia.ko kernel module during bootup. It looks like it might have something to do with the vgaarb VGA arbitration process. Please see the attached screenshot. The machine responds to pinging but never continues past the point shown in the image. The machine is a Core i7 based system with 2 x 295 GTX cards. I’ve attached lspci and dmesg outputs.

Any ideas what is going wrong?

dmesg.txt.gz (12.6 KB)
lspci.txt (3.92 KB)

If we avoid loading nvidia.ko during boot and insert it manually from the command line, the module loads OK and the systems stays alive, but if we try running for example nvidia-smi, it blocks forever and we get “soft lockup - CPU#2 stuck” errors as seen in the attached kernel logs.

soft_lockup.txt.gz (3.25 KB)