GPGPU survey

Hello, im currently making my 5th academic year and in that case i realise a professional thesis about General-Purpose computation on Graphics Processing Units. More precisely, i’ll cover these questions :
- What are the gpgpu’s constraints and possibilities
- How to use or adapt existing code? (heavy computations delegation)
- What future for this technology

In that context, i’m appealing to you to estimate and target the actual implementation with the following survey : GPGPU survey

I can give you a copy on simple request (but it will be in french)

Thank you.

(sorry if i posted in the false topic, just let me know)

Ok, I did it. In my opinion, “democratization” of computing is getting a bit carried away.

It would be realy kind and usefull to me to have a little more responses.

Sorry for my english, i’m french…

Thank you.

I did not take the survey (its blocked by firewall)… But I do think GPUs have democratized computation.

Look @ Intel - you cannot write optimized code without having their commercial tools in place. Intel gives nothing for free…

However NV is bringing lot of GFLOPs to end users at a low cost… I think that makes lotta sense…

Anything which gives you more computing power for less money could be argued is a “democratization” of computing, although I agree that term is kind of melodramatic.

I’d rather call it the performance computing oligarchy, because when you want to go cheap, all the options you can choose from are

-nVidia (CUDA, openCL)

-ATI (Brook, CTM, openCL)

-Sony (Cell SPU programming, but only on older Playstation Firmware)

Only two major players in the consumer space remain (Intel Larrabee has dropped out for now, Sony kind of closed the system)

Christian

I’ll tell you my experience so far with CUDA, maybe it’d help

First of all IT IS FAST, u just have to get it right.

Second of all no matter how many ppl have written the code before, it’s somehow not that easy to reuse the code, cause you still need to optimize for your specific application … so you just use it as reference.

Documentation is fine, but is still missing a lot, thats where the forum comes in handy where NVIDIA employees can directly answer you (which i guess is a major “thumb up” up for CUDA)

U also have to understand lots of stuff about the GPU architecture in order to start writing, or else it’ll be slow.

CUDA profiler is extremely helpful, best tool I’ve seen, unlike the debugger which is still not that good.

Code size expands a lot of course, the more performance you want to gain, the more code u’ll have to write. It doesnt always work like that but on average a single loop on the CPU could be broken into a page or 2 for CUDA code.

Mapping threads is not that easy. And designing the function to take arbitrary Grid/block sizes is not easy too. Sometimes these things are fixed to enable certain functions to work or to get through certain constrains.

SHARED MEMORY (very useful memory interface) is way toooooooooooooooooo small, which is a big challenge for a developer. wished it was larger, it would’ve made a great difference. I guess its now 48K in the new fermi architecture.

Texture memory is well documented and can vastly increase performance

Some important stuff about the GPU architecture are barely discussed in the programming guide, such as constant memory and pinned/mapped memory, you must use then search the SDK for one or 2 examples in order to get things straight.

DEBUGGING IS AN ISSUE, ATLEAST FOR ME, thats what makes it so much pain, specially when the emulator doesnt serve ur need for specific tasks.

CONS

Using only visual c++ for windows, no support for mingw

nexus debugger is not in shape yet, and there is no stand alone version, needs visual studio 2008/2010 which is way uncool

Overall I guess the learning curve is nice, although writing code for GPU is not that easy, can get more serious in the future knowing that the new 480 GTX card is 480 cores … I wonder how many much performance I could get on that thing :-D

One of the major reasons boosting CUDA is “Badly written CPU code” — which surprisingly is very very common.
People get pumped up by 100x, 200x and get carried away by CUDA.

Firstly, i want to thank you for your responses and comments.I got approximately 40 answers (+10 in a french forum)
I’ll stop the survey in 1 week, and after that i’ll publish some results deduced from it.

But i’m curious, has anyone ever used Fermi? Works it good?

For my experience, the first problems that i got was into designing the function to take arbitrary Grid/block sizes