Viability for CUDA to improve millisecond calcs? Is there too much overhead?

Hi all,

At the moment, I’m not trying to solve Fermat’s Last Theorem nor decode the Zodiac’s uncracked messages (those come later) – in the meantime, I’m just trying to improve some math calculations that currently take between 1-100ms in C++ on an Intel CoreDuo.

I’d like to get some feedback on how effective CUDA is to improve these “short” calculations.

For example, I have a 64K array on which I need to perform two FFTs, and some array manipulations (such as element swapping and a moving average filter). The total execution time is 20ms in C++, including the two FFTs.

I have not found encouraging results in my early CUDA experiments – i.e.:

There is a 200ms overhead just to initialize the CUDA engine. This is way longer than any calculation I would seek to optimize, however, with some creativity this time could possibly be hidden at the beginning of my application.

The kernel/execution/memory swapping overhead seems prohibitive to optimize math functions that already only take milliseconds or less. For example, it only takes 106us to swap and invert one dimension of a 32K 2D float array in serial fashion on a 3GHz Core2Duo. Considering memory and kernel allocation, how can CUDA do this faster?

So, I’m wondering, is anyone using CUDA to improve single to tens-millisecond calculations? Or, is the inherent overhead just too much?

100ms - is a normal execution time for the CUDA kernel because of 5s watch dog timer and display freezing.

Hi im doing real time physics simulation using cuda. i have kernels which run at 0.02 and 0.04 milliseconds. as for the data transfer, that can be a killer. one of the simpler answers is to do all your math in the gpu so you have less to push back and forth. of course allot of times that doesn’t make sense.


HOw does the 5 second watchodg related to this 100ms…??? ??? ???