I’ll tell you my experience so far with CUDA, maybe it’d help
First of all IT IS FAST, u just have to get it right.
Second of all no matter how many ppl have written the code before, it’s somehow not that easy to reuse the code, cause you still need to optimize for your specific application … so you just use it as reference.
Documentation is fine, but is still missing a lot, thats where the forum comes in handy where NVIDIA employees can directly answer you (which i guess is a major “thumb up” up for CUDA)
U also have to understand lots of stuff about the GPU architecture in order to start writing, or else it’ll be slow.
CUDA profiler is extremely helpful, best tool I’ve seen, unlike the debugger which is still not that good.
Code size expands a lot of course, the more performance you want to gain, the more code u’ll have to write. It doesnt always work like that but on average a single loop on the CPU could be broken into a page or 2 for CUDA code.
Mapping threads is not that easy. And designing the function to take arbitrary Grid/block sizes is not easy too. Sometimes these things are fixed to enable certain functions to work or to get through certain constrains.
SHARED MEMORY (very useful memory interface) is way toooooooooooooooooo small, which is a big challenge for a developer. wished it was larger, it would’ve made a great difference. I guess its now 48K in the new fermi architecture.
Texture memory is well documented and can vastly increase performance
Some important stuff about the GPU architecture are barely discussed in the programming guide, such as constant memory and pinned/mapped memory, you must use then search the SDK for one or 2 examples in order to get things straight.
DEBUGGING IS AN ISSUE, ATLEAST FOR ME, thats what makes it so much pain, specially when the emulator doesnt serve ur need for specific tasks.
Using only visual c++ for windows, no support for mingw
nexus debugger is not in shape yet, and there is no stand alone version, needs visual studio 2008/2010 which is way uncool
Overall I guess the learning curve is nice, although writing code for GPU is not that easy, can get more serious in the future knowing that the new 480 GTX card is 480 cores … I wonder how many much performance I could get on that thing :-D