GPU Performance Degradation GPU performance degrades after 30 sec.

We have an app processing stereo images at 30 fps.
Gui-wise it uses opengl .net forms.
Original image processing [IP] uses opencv library.
To unload CPU we are trying to move some IP algorithms to GTX 260 device.
For currently ported IP we have a CPU/GPU run-time switch.
Closely monitoring each cuda step we record that after >30 seconds of normal running after enabling cuda the performance suddenly degrades.
Each step becomes taking longer including copy to device, algorithm execution, copy from device, and even device to device copy which we do to cuda-mapped pixel buffer object.
From the application perspective nothing changes. The app processes the same images over and over again. All device memory is allocated once on initialization.
Any ideas where to look? What to check?

Is your card overheating?

Is the performance degrading after 30 seconds of processing multiple images (i.e. after 30 seconds, processing separate images becomes slower), or are you doing >30 seconds of processing on a single image, which slows down 30 seconds into the computation?

It would narrow down the problem to know how much time elapses between kernel calls as well. If tmurray is right, perhaps the card is cooling down a bit between processing the images and heating up too much while processing.

According to NVidia system monitor the temperature goes up just few degrees of celsius: from 50 to 52.

We are processing 2 images simultaneously at 30 frames per second rate.
We copy the 1st image to the device. While we process the 1st we copy the 2nd image to the device. While we process the 2nd we copy the 1st one back from the device. Then we copy the 2nd image from the device. The cycle is triggered every 33ms by a new frame. It looks like after >1000 cycles the copying does not overlap with processing anymore. Is this possible to confirm? May opengl gui stop cuda from overlapping copy and process? Any other reasons? Although according to nvidia system monitor the gpu load is 20% only…

OK. Here is what is going on. When we first turn on cuda processing the card gives us full power with ~600MHz clock rate. After 30 seconds of running the “clever” card decides that we are not loading it too much (see 20% above) and drops its clockrate to ~380MHz (gets ~50% load at this rate). Any idea how to disable this behavior? Programmatically?

Did you ever solve this? I am having the same problem on a Mac Pro running Leopard.

Driver bug that should be fixed soon.

Modern cuda boards support more than one of so called performance levels: performance 3D, standard 2D and low power 2D.

The clever cards decide when to switch between them by themselves. Everything is oriented to save energy these days.

As a work around we install a rivatuner utility: It allows to force constant performance level.

Obviously the same thing should be doable programmatically. I posted a Q about this API in but did not get a reply so far.