SLI performance in low resolution (windowed) vs high resolution (fullscreen)

Hello guys,

I am currently measuring FPS and frame time in a scene with about 80k vertices implementing soft shadow mapping.
The frame time is measuring with the GL_ARB_timer_query extension while FPS are measured with a CPU timer.
During these measurements I came up with some interesting results that I do not fully understand so far:

Windowed 800x800
Single GPU: ~0.3ms GPU Time, ~2500FPS
SLI (force AFR 1): ~0,8ms, ~1200FPS

Fullscreen 2560x1600
Single GPU: ~2,2ms GPU TIme, ~ 600FPS
SLI (force AFR 1): ~1,4ms, ~900FPS

The issues should not be related to RTT, I am using a FBO and I requested WGL_SWAP_EXCHANGE_ARB swap mode in the pixelformat. WGL_SWAP_COPY_ARB also results in bad performance when using a higher resolution and fullscreen.

Unfortunately NVIDIA Nsight crashed when i tried to capture GPU frames when AFR 1 is enabled. I did however made two logs with GPUView that give me an idea what is happening but I would need some further elobaration on this. Following sceenshots show a comparison of 800x800 windowed vs 2560x1600 fullscreen, both with SLI enabled (force AFR 1).
http://www.paxi.at/random/gpuview0.png
http://www.paxi.at/random/gpuview20.png

I read the following on the web: When SLI is enabled, the NVIDIA driver must coordinate the operations of both GPUs when each new frame is swapped (made visible). For most applications, this GPU synchronization overhead is negligible. However, because xxx renders so many frames per second, the GPU synchronization overhead consumes a significant portion of the total time, and the framerate is reduced.

Is this somehow the case here? If so could someone eloborate a bit more of what exactly happens.

HOWEVER if I render in 2560x1600 in windowed mode the performance is slightly WORSE - SAME with SLI again compared to using a single GPU. I am kinda confused :D

Thanks for any suggestions in advance.

Yes. The driver have to be conservative to render correctly for most common applications. However, the driver may not know all the assumptions from the application side. If there are assumptions can be made about the rendering behavior, using FBO as an example, the FBO is not needed by the other AFR GPU, then it can skip some GPU to GPU transfers.

One suggestion is to try rename your application to SLITest.exe. This will pick up AFR profiles. This may not help the windowed case. Generally you should see better SLI scaling with full screen applications, in these cases, you are making use of your SLI bridge.