Display and performance are the computation performances influenced by the display ?

I have a very dummy question :

when running a program on a GPU, is the execution speed influenced by the fact that the card is also used for the computer display ? In that sense, is it sensible to install a card on the computer dedicated to the the display while the computation is performed on other(s) ?

When the display is “idle”, the effect on CUDA performance is very tiny. By idle I mean the user isn’t moving windows around, opening windows, and most of display is just remaining static with perhaps a few status updates every few seconds.

Sure, you can do that. It is also very sensible just to run a headless box (or many) without any display at all controlled by remote login or a job scheduler.

Depends on what the display is doing. AFIAK there’s very little performance hit if the GPU only has to display a static desktop. On the other hand, running graphics-heavy applications during CUDA computations will slow both down.

EDIT: MisterAnderson42 beat me to it ;)

All right, thanks a lot for the answers. :thumbup: Lets try and go a bit further…

The computation is image processing. So the idea is to perform the processing and then display the resulting image. Profiling the application, I observed very long “idle time” between calculations on the GPU, due to the display of different buffers.

I therefore though of launching a seperate thread in charge of the GPU processing. But what will happen if the card if simultaneously asked to display an image and to compute staff ? Can it really boost the application framerate ?

It is a Windows application and the display is performed through Windows. Do you think I could get better results if the display was driven directly by the GPU through CUDA ? I will save the memory transfer, but it not significant.

By the way, another question…
I tried to initialize the GPU once at the initialization of the application, instead of reinitializing it each time the loop is entered, but without success. The initialization calls CUT_DEVICE_INIT() and performs severall memory allocations. The CUDA code is launched as a DLL. When compiled in release, the application runs but the CUDA code is not executed correctly, and when compiled in debug mode, it fails at a memcopy.
Any idea ?