Hi,
I have installed nvidia OpenGL drivers on amazon gpu cluster instance having Nvidia Tesla M2050 GPUs and I see a weird behavior when I run any two apps in quick succession. The XOpenDisplay takes a long long time to return. The driver versions that I have tried are NVIDIA-Linux-x86_64-319.23.run and NVIDIA-Linux-x86_64-319.32.run.
I am running the server in a headless mode with xorg.conf generated using the following options:
sudo nvidia-xconfig -a --no-xinerama --no-dynamic-twinview --use-display-device=None --virtual=1280x1024
which maps two screens each of them to two devices.
Finally I run X Server using:
sudo /usr/bin/X :0
and then I run any application on any one of the GPUs using:
export DISPLAY=:0.0 glxgears
or
export DISPLAY=:0.1 glxgears
To test the behavior I created a simple application which does nothing but opens and then closes X display in quick succession and noted that after doing 1st XOpenDisplay, it takes a really long long time before the call to XOpenDisplay returns.
The code for this is something like this:
int i;
//Run the following multiple times and measure the rate / sec.
for (i = 0; i < count; i++) {
gDpy = XOpenDisplay(NULL);
if (!gDpy) {
printf("Error: couldn't open default X display.\n");
return;
}
XCloseDisplay(gDpy);
}
...
...
Additional code to invoke the above loop and measure the rate/sec for above.
I am attaching the nvidia-bug-report with this for reference.
Is it a problem with my setup or the drivers for Tesla? Any help is appreciated.
Also I notice that when an application exits the output is similar to the following on the terminal where I started the X. Only when this is dumped when the previous application exits / closes the display, the XOpenDisplay succeeds.
12 XSELINUXs still allocated at reset
SCREEN: 0 objects of 344 bytes = 0 total bytes 0 private allocs
COLORMAP: 0 objects of 8 bytes = 0 total bytes 0 private allocs
DEVICE: 0 objects of 104 bytes = 0 total bytes 0 private allocs
CLIENT: 0 objects of 152 bytes = 0 total bytes 0 private allocs
WINDOW: 0 objects of 72 bytes = 0 total bytes 0 private allocs
PIXMAP: 4 objects of 112 bytes = 448 total bytes 0 private allocs
GC: 8 objects of 40 bytes = 320 total bytes 0 private allocs
CURSOR: 0 objects of 16 bytes = 0 total bytes 0 private allocs
DBE_WINDOW: 0 objects of 24 bytes = 0 total bytes 0 private allocs
GLYPH: 0 objects of 48 bytes = 0 total bytes 0 private allocs
PICTURE: 0 objects of 8 bytes = 0 total bytes 0 private allocs
SYNC_FENCE: 0 objects of 8 bytes = 0 total bytes 0 private allocs
TOTAL: 12 objects, 768 bytes, 0 allocs
4 PIXMAPs still allocated at reset
PIXMAP: 4 objects of 112 bytes = 448 total bytes 0 private allocs
GC: 8 objects of 40 bytes = 320 total bytes 0 private allocs
CURSOR: 0 objects of 16 bytes = 0 total bytes 0 private allocs
DBE_WINDOW: 0 objects of 24 bytes = 0 total bytes 0 private allocs
GLYPH: 0 objects of 48 bytes = 0 total bytes 0 private allocs
PICTURE: 0 objects of 8 bytes = 0 total bytes 0 private allocs
SYNC_FENCE: 0 objects of 8 bytes = 0 total bytes 0 private allocs
TOTAL: 12 objects, 768 bytes, 0 allocs
8 GCs still allocated at reset
GC: 8 objects of 40 bytes = 320 total bytes 0 private allocs
CURSOR: 0 objects of 16 bytes = 0 total bytes 0 private allocs
DBE_WINDOW: 0 objects of 24 bytes = 0 total bytes 0 private allocs
GLYPH: 0 objects of 48 bytes = 0 total bytes 0 private allocs
PICTURE: 0 objects of 8 bytes = 0 total bytes 0 private allocs
SYNC_FENCE: 0 objects of 8 bytes = 0 total bytes 0 private allocs
TOTAL: 8 objects, 320 bytes, 0 allocs
With -noreset option to XServer I see that the issue is resolved. I should rather say that this is a workaround rather then the resolution. Could someone from Nvidia please confirm that if it is an issue with Nvidia drivers or Xorg xserver?
The X server’s default behavior is to reset when the last client disconnects. That requires re-initializing the GPU, which can take a moment. If you don’t need the server to reset, then passing -noreset is the right option.