Does the display output on Tesla C2050 reduce performance?

I just ran the SDK examples on a C2050 to see the difference between the C1060 and at 1st, didn’t see much.

I noticed the device to device memory bandwidth were comparable: 73Gb/s for C1060 and 78Gb/s for C2050. Then I saw the monitor was attached to the Tesla, which I thought would use the GPU significantly. I disabled that monitor in Windows and made a Quadro 290 the main display, and I don’t know if that improved performance - memory bandwidth went to 79Gb/s (maybe - since monitor refresh consumes very little) but compute performance for things like convolutionFFT2D didn’t improve.

Then, I saw you can disabled memory error correction and the bandwidth went up to 90Gb/s, along with compute performance for most applications.

I still want to know: how significant is the impact of the display output on Tesla C2050 on CUDA perforamance?

In theory display scan-out uses a small amount of additional memory bandwidth, but I’ve never seen it make a measurable difference.

Probably the biggest thing I would worry about running a display manager on Tesla is the driver watchdog timer.

3D accelerated window managers have been known to slow down CUDA programs on older cards by adding delays between kernel launches while the GPU handles GUI updates. It’s more noticeable with CUDA programs that launch lots of short kernels. Fermi is supposed to reduce that context switch overhead, though.

OK, sounds about right. I didn’t expect drawing the screen to take that much resources, assuming Windows/Xserver only redraws when needed and only the invalidated sections. I suppose if 3D graphics is used (I don’t use), then incremental redraws can’t be done.