I have a sequence of kernels that are executed hundreds to thousands of times, all of which operate on a single set of input data in order to produce a result. Everything functions fine and behaves as expected.
I am focusing on performance now and have noticed that the time difference between executing these kernels on each set of input data is always a difference of some multiple of 16ms +/- 1ms. This is of course very close to the refresh rate of my display devices.
I have changed the behavior of the kernels to modify their performance, and while the times change the differences are always very close to a multiple of 16ms.
The GPUs I am running on have no output device connected to them (I have a separate GPU strictly for Windows to display on my monitors). Our application does not use any graphics on the CUDA device and shouldn’t care about the display rate at all. I have gone through the NVIDIA control panel and disabled all of the graphics-specific settings for this device. In short, I am using the GPU strictly as a non-rendering coprocessor to compute a result and return it to the CPU.
Our application essentially just crunches numbers and produces a result. There is no need for it to sync to the display whatsoever.
- Is there some setting I have missed to make this device not behave like it needs to sync with the display?
- Is there potentially something else I haven't considered that might be causing this?