I have an Ubuntu 14.04 system containing several Tesla S2050 GPUs on which I had been running NVIDIA drivers 340.29 and CUDA 6.5-14. To create /dev/nvidia* files, I have /etc/rc.local set up to run nvidia-smi during bootup. While this worked fine in the past, I’ve observed that nvidia-smi hangs indefinitely (and never creates any /dev/nvidia* files) after upgrading to 346.46 and CUDA 6.5-19. If I don’t run nvidia-smi during bootup and start it manually a little while after booting the system, it runs without any problems. Adding a 30 to 60 second sleep before starting nvidia-smi in /etc/rc.local appears to help; however, is there some way to explicitly check whether the GPUs are “ready” before running nvidia-smi? More generally, has anyone observed a similar change in performance when upgrading to the 346 drivers? This problem doesn’t appear to affect Fermi GeForce GPUs (i.e., running nvidia-smi in /etc/rc.local continued to work fine after upgrading to the newer drivers).
That sounds strange. When it’s hung, can you please try attaching to it with gdb and running the ‘backtrace’ command to see where it’s stuck?
Does the problem go away if you use nvidia-modprobe rather than nvidia-smi to create the device nodes? Use the -c option to create device nodes.
It appears that the problem was caused by a module build failure when the machine’s kernel was updated; reinstalling the nvidia-346 Ubuntu package caused dkms to build/install the module properly - the devices initialized successfully on the next reboot.
If you would like me to post the dkms make.log file somewhere, let me know.