I’ve been playing around with the idea of writing a beginners+ article regarding cuda.
I’ve been thinking of writing about certain things that many programmers here have some problems
with mainly: debugging, understanding why kernel fails all sorts of things, some of them are in the
programming guide but with a more hands-on empasise. Also about multi-gpu code.
I’d be happy to hear what you think? what you’d be interested in…
The first two days I ever played with CUDA, I was in a bad mood because there was no clear step by step dumb newbie tutorial for getting “hello world” to run. There’s several great newbie projects in the CUDA SDK, the simpleXX series especially, but there wasn’t any real guidance to know that’s what I should try first, and a step-by-step dummies guide to get that first project compiled.
Sure, after a few days I was doing OK, and now a year or two later I laugh at the memory, but I do know that it was discouraging that there was no handholding intro, even a one-page “try this first” guide. Even a sticky forum post would have helped.
Attached you can find a word document containing what seems to be my first CUDA article.
The paper attempts to address a few things I’ve stumbled upon when starting programming CUDA
plus some answers/code samples answering, what I thought, was common misunderstandings of
people when starting to write CUDA code, as can be seen in these newsgroups.
I’d appriciate any feedback/ideas/comments/remarks/… :)
He ask for comments and or suggestions. I gave him one, which I believe is important, and is the exactly the same advice I give my colleagues and students when they write proposal/papers/reports/thesis for circulation or distribution. Is there a problem with that?
All formatting got lost when I opened the thing in openoffice, but so what :)
From the look and feel to it, this is a nice article. Publish it.
Point out more distinctly that newbies are not your target audience.
DO NOT USE SDK FUNCTIONS, make it more self-contained
emphasise again and again and again that all speedups are completely irrelevant if not measured against an optimised CPU code.
Sorry, I don’t have time currently to review this further. But: We are looking for stuff like that on gpgpu.org. If you consider gpgpu.org a place to publish your article, let me know and we’ll work something out…
If beginner doesn’t want to read Programming Guide, don’t force it upon him. After all he took your shorter article not to read the long Programming Guide (at least at first). I believe reading some simple examples is actually better than jumping into PG immediately.
I wouldn’t use some unknown myKernel function. Put there something, so that reader could copy-paste it, compile and run. I think a kernel which computes sum of all elements may be a good choice. First version would simply do it in sequential form <<<1,1>>> while more advanced version would perform a reduction <<<1,X>>>. This code will be still easier to understand than matrix multiplication from PG apprendix. Besides, prefix sum is used soo often and beginners usually think of atomic instructions at this point.
Maybe add some sections so that reader knows where he can make a pause.
Understanding cudaMemcopy is crucial, however synchronization can be talked about somewhere later. Don’t throw all those details on a newbie reader!
use different fonts for code and its comments (e.g. italics for comments and monospace for code).
I simply suggest to add some sort of short chapters to your guide.
One of scenarios of reading a guide is to: read part of it - play with it/implement it, check how it works - read next part of it - play with it - etc, etc…
Having the guide divided into chapters helps the reader to decide when to stop reading and to start coding something.
Also I would suggest trying to limit the number of false examples (codes that do not work), at least at beginning. Give the reader something that works first so that he can have a taste what can be done.
It could be something stupid like 2+2 computed on GPU :) [sort of ‘hello world’].
Regarding reduction, it seems to be a good order:
working single-threaded code
multithreaded-code with race conditions
Also, once you are done with this text, I would suggest a slightly more advanced course on bank conflicts. Reading just PG on this makes it look very very complicaded!
I believe the code is self-contained and covers all the basics. The “beauty” of it is that you can’t do anything wrong for this kind of “map” operation in CUDA: We did this because this operation is used in the prominent CUDA articles in ACM queue. This matches PDan’s suggestion to start with a simple working code and to discuss common errors later.
Funny to see that reductions are apparently a nice didactic approach. We’ll be doing pretty much exactly the same thing in upcoming CUDA hands-on sessions: Sequential summing within one block for starters and working from there…
let’s peer-review here. eyal, please email updates (dominik.goeddeke AT math.tu-dortmund.de) Once the document is settled, I’ll see to posting and eventually integrating it at gpgpu.org. Can you develop the tutorial in HTML? That’d be cool… Also, please do not try to reinvent the wheel over Wen-Mei’s CUDA book.