CUDA book by Kirk & Whu available

Programming Massively Parallel Processors: A Hands-on Approach

Can you teach CUDA on 256 pages? ;)

Well - seems like most of the material are course notes that were available for the long time; indeed I was hoping for more too, but still I think having some kind of “real” textbook available is going to make life much easier for CUDA novices…

Well, they could have used 512, but that would have caused a lot of register pressure and shared memory problems :D

Finally, hopefully all the basics and the nuggets in one place!


LOL… we HPC people…

Yo YDD! Good one! lol… I hope they will use FERMI to print their next book…

Yep… just a rewrite of the programming guide. At least the programming guide is free (and comes in colors too :nuke: ).

And ‘available’ is also a bit of an overstatement. At least in the Netherlands it will only be available around the end of march.

You could order the book immediately from Elsevier site, and at it is listed to be available next week (February 5). Don’t know what exactly you mean by “available in Netherlands” (Elsevier Europe site indeed list the book to become available throughout March), but being able to order the book immediately counts as “available” in my book (moreover, with that euro/dollar exchange rate as throughout past year or so, I find it much more convenient to order everything directly from US).

I have a member in our R&D that doesnt understand CPU threading, didn’t read the Programming guide nor looked at the SDK.

And went through to write CUDA code and got x50 boost - without shared memory and __syncthreads, btw (obviously this was

somewhat simple algorithm but still…)

I think this can also be seen in those newsgroups… nVidia just did too damn good of a work…

I think he missed all the fun but time to market is more important to certain people… :">


If he would understand CPU threading and program optimization in general, I guess the speedup would have been less impressive.

A lot of the hype around CUDA is based on comparisons to unoptimized, single-threaded CPU code.

I’m not sure - the CPU code was ok - no SSE and fancy stuff like this but the CPU code was ok - and if you compare a CPU single-core

(no threads and core issues) to a single GPU and you get a x50 boost - its still a x50 boost :)

I think that taking a simple loop and just copy-paste it to a kernel and run it, you might be lucky enough not to get into

coalescing issues (let alone the fact that it is now much more relaxed than on older cards) and the performance will

be good. This is like, maybe, VisualBasic versus C++. You get good results and faster develop times with VB but you

dont need to understand whats going on… at least not that much… :)

OK, maybe I shouldn’t comment as not understanding things just isn’t my world. But in all fairness, I think CUDA should at least be compared with OpenMP. It’s even simpler than CUDA (no need to copy stuff to the device and back), and any problems with parallelism are the same as in CUDA.

I regularly use OpenMP and SSE, and all of a sudden a 50x speedup is little more than 3x, even less on a Core i7.

OTOH, I should be grateful to all hardcore gamers and people writing slow code, as they pay for my cheap computational power, both GPU- and CPU-wise. :)

Well, available in the Netherlands as in: we ordered the book from our standard supplier, they called the publisher, the publisher told it would be end of march before they could ship the book.

They way you define available, Fermi is also already available, you can order now (and get a C1060 while you wait for the real thing to arrive) ;)

I’d say we’re both nitpicking here, but just to clarify my claim: the book is listed as on stock here, at publisher site; indeed, this is only for US sales, but if I really need to have it immediately, I’d try ordering from there, and then using ForwardIt to deliver it to me in Europe - from my previous experience with ForwardIt, I’d expect package to arrive in couple days. Now, it could be indeed that the publisher is just lying on the Web page above, and that the book is actually not on stock at all - if you could confirm that your supplier contacted Elsevier US, and not Elsevier Eurpope, then you’re definitely right (the fact that now even on the book is listed as “shipping within 1 to 2 months” is supporting this alternative too). So - yes, maybe the situation is indeed like with Fermi: order now, and use an early draft of the book in the meantime.

My guess would be they contacted Elsevier Europe. Good to know about the early draft!

I made the same expierence with the algorithm I had to implement.

Another point is that CUDA doesn’t benfit from more intelligent algorithms.

For example:

You have an array and you need to update only a certain amount of cells.

The CUDA implementation achieved only a speedup of factor 2 compared to the brute force approach (updating all cells). On the CPU (with OpenMP) I gained a speedup of 6 by the same optimization.

At the end the speedup of CUDA was 2.5 compared to Core 7i. Data transfers are included in the speedup measurements.

I assume this is due to the lack of cache in the current GT200 architecture and I hope Fermi will achieve better results in such cases, when the data accesses are unpredictable und sparse.

GPU computing is good with algorithm where you have to update all or most cells. The performance is in such a case quite good.

Therez a reason behind why they call it “Mad science…”

Table of contents and first three chapters of the book.

The PDF says “for review” but this is straight from the publisher’s site.