Hey all. I finally upgraded my computer to get a video card that can use CUDA, so I have been playing around with it a lot for the last couple of days, trying to read everything I can get my hands on. While the documentation is great for the nitty-gritty details, I have yet to find a good source of information on how to tackle and deefhey can be optimally solved for the GPU. I have done previous work in parallel and distributed environments, but most of the problems were of the ‘embarrassingly’ easy sort, and I never had to worry about such things as data coalescing. Most of the time, the work I was doing was SIMD, but with lots and lots of branching – which works in clusters, but not so well on GPUs I read (though, I am sure it is still better than nothing).
I was hoping that someone could point me towards a good resource on how I can rethink and restructure my problems to better suit the GPU paradigm. And even a discussion on what problems the GPU really does NOT work for would be great!