Problem Structure for GPU Programming

Hey all. I finally upgraded my computer to get a video card that can use CUDA, so I have been playing around with it a lot for the last couple of days, trying to read everything I can get my hands on. While the documentation is great for the nitty-gritty details, I have yet to find a good source of information on how to tackle and deefhey can be optimally solved for the GPU. I have done previous work in parallel and distributed environments, but most of the problems were of the ‘embarrassingly’ easy sort, and I never had to worry about such things as data coalescing. Most of the time, the work I was doing was SIMD, but with lots and lots of branching – which works in clusters, but not so well on GPUs I read (though, I am sure it is still better than nothing).

I was hoping that someone could point me towards a good resource on how I can rethink and restructure my problems to better suit the GPU paradigm. And even a discussion on what problems the GPU really does NOT work for would be great!


You’re in luck. NVIDIA just published a “best practices” guide, which if you skim it provides a great overview of how to use CUDA effectively. It pretty well summarizes the important points of years of forum discussions here, without all the noise. :)…esGuide_2.3.pdf

Read this and put it under your pillow, next to the Programmer’s Guide.