At the moment, I’m not trying to solve Fermat’s Last Theorem nor decode the Zodiac’s uncracked messages (those come later) – in the meantime, I’m just trying to improve some math calculations that currently take between 1-100ms in C++ on an Intel CoreDuo.
I’d like to get some feedback on how effective CUDA is to improve these “short” calculations.
For example, I have a 64K array on which I need to perform two FFTs, and some array manipulations (such as element swapping and a moving average filter). The total execution time is 20ms in C++, including the two FFTs.
I have not found encouraging results in my early CUDA experiments – i.e.:
– There is a 200ms overhead just to initialize the CUDA engine. This is way longer than any calculation I would seek to optimize, however, with some creativity this time could possibly be hidden at the beginning of my application.
– The kernel/execution/memory swapping overhead seems prohibitive to optimize math functions that already only take milliseconds or less. For example, it only takes 106us to swap and invert one dimension of a 32K 2D float array in serial fashion on a 3GHz Core2Duo. Considering memory and kernel allocation, how can CUDA do this faster?
So, I’m wondering, is anyone using CUDA to improve single to tens-millisecond calculations? Or, is the inherent overhead just too much?