Multiple GPUs: Listing devices very slow

Hi forum,

I’ve recently started working on a multi-GPU system and noticed that my programs were running slow. I tracked it down to the querying of the devices by the system.

The system is an Ubuntu Linux x86_64 2.6.32-25 with NVIDIA driver 260.19.26. It has 4 Geforce GTX 295 (nvidia-smi reports 8 devices, which I assume is because of the 2 GPUs on each board, but I’m not sure since I haven’t seen the actual hardware). X is not running, nor is anything else which I might suspect of interfering.

When listing devices the system is very slow. ‘nvidia-smi -a -q’ takes about 5 seconds, my own enum-gpu program as well, and my real program spends the same amout of time. When looking at a strace, one can see that the time is spent between the call to

open("/dev/nvidiactl", O_RDWR) = 3

and after going through the devices

open("/dev/nvidia7", O_RDWR)   = 11

After that, the opens to the /dev/nvidiaX are repeated, but don’t take an unusual amount of time.

Believe it or not, it is probably the fact the X11 is not running that is causing the problem. NVIDIA’s driver unloads a lot of stuff and releases state when the there is no client (X11 or an application). The extra time you are seeing is likely coming from the time required for the driver to reload everything and the initialize state for the 8 discrete GPUs in your system.

Try running nvidia-smi -l -i 10 as a background process and then repeat your tests. By keeping nvidia-smi alive and polling the hardware, it will stop the driver unloading and release all that state and you might find applications launch and initialize considerably faster.

Look at dmesg, with debug info enabled with the nvidia module. To do that, set the module parameter NVreg_ResmanDebugLevel=0

modprobe nvidia NVreg_ResmanDebugLevel=0

This seems to work well, thanks!

That works well for me for a test. If I run “nvidia-smi -l -i 10” on one window, then nvidia-smi responds quickly on the other. Thus, I seem to have this problem. However, I have not been able to figure how to get X11 going without a monitor connected to my server. Is there a way to tell nvidia-smi not to check whether X11 is running? If not, any suggestions on how to get X11 going without a monitor. I can send the details if needed.