Getting started with parallel programming Suggested reading

One question that I’ve been asked a lot is how to get started with parallel programming. I asked around internally at NVIDIA, and got some good suggestions. So I’m posting the responses and I want to encourage people to comment and/or add their own suggestions here. Does anybody have a favorite textbook they want to share? It doesn’t have to specific to CUDA.

    [*] From HPC course at UNC which also covers CUDA. http://www.cs.unc.edu/~prins/Classes/633/

    [*] From Mark Harris: It’s not a textbook, but I always recommend these course notes on PRAM algorithms (the CRCW PRAM model maps very closely to CUDA, especially within a thread block using shared memory) by Sid Chatterjee & Jan Prins. They are concise and provide good examples for reductions, scan, Brent’s Theorem, etc. http://www.cs.unc.edu/~prins/Classes/633/Handouts/pram.pdf

    [*] He also requires reading from Kumar et al. Introduction to Parallel Computing: Design and Analysis of Algorithms.

    [*] Designing and Building Parallel Programs, I. Foster, Addison-Wesley, 1995. http://www-unix.mcs.anl.gov/dbpp/

    [*] IBM’s redbook “RS/6000 SP: Practical MPI Programming” is very famous for MPI users. It has rich contents about parallel approach even though they publish 10 years ago. http://www.redbooks.ibm.com/abstracts/sg245380.html

    [*] Parallel Programming in C with MPI and OpenMP by Michael J. Quinn (Author) is good for beginners.

    [*] Parallel Programming with MPI by Peter Pacheco

    [*] Parallel and Distributed Computation: Numerical Methods (Optimization and Neural Computation) by Dimitri P. Bertsekas

    [*] Multiple people suggest: The Art of Multiprocessor Programming by Maurice Herlihy (Author), Nir Shavit (Author) http://www.amazon.com/Art-Multiprocessor-P…y/dp/0123705916

    [*] Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) by Barbara Chapman (Author), Gabriele Jost (Author), Ruud van der Pas (Author), David J. Kuck (Foreword)

    [*] Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface (Scientific and Engineering Computation) by William Gropp (Author), Ewing Lusk (Author), Anthony Skjellum (Author)

    [*] From David Kirk: We use Tim Mattson’s book as a companion to the CUDA material.

    [*] David Kirk & Wen Mei Hwu’s CUDA text book is available on the course website. That is one rev out of date, but pretty close. http://courses.ece.illinois.edu/ece498/al/

    [*] From Paulius Micikevicius: My personal favorite book on parallel algorithms is “Introduction to Parallel Computing” by Grama et al. It covers basic interconnect topologies, algorithms, analysis, MPI and OpenMP. http://www.amazon.com/Introduction-Paralle…a/dp/0201648652

    [*] If one is leaning slightly more towards the theoretical side of parallel algorithms, then “Introduction to Parallel Algorithms” by Joseph Jaja is a good source. Contains a more thorough treatment of algorithms based on prefix sums (things like various tree and graph algorithms).

    [*] If one wants to go completely to the theoretical side (P-completeness, etc.), then “Limits to Parallel Computation: P-Completeness Theory” by Ray Greenlaw is an excellent book. It’s certainly not applicable to introductory courses, in the same way that NP-completeness isn’t applicable to introductory algorithms courses.

1 Like

This may be the same book referred to as “Tim Mattson’s book” above. The title is “Patterns for Parallel Programming” by Mattson, Sanders & Massingill. Addison-Wesley.

Thanks for the list - I do have a question though. What does nVidia suggest to do with a serial code that

needs to be ported to the GPU. I have two such kernels that were ported to the GPU, one successfully and the

other i got ~x4 factor (which is not enough).

Such code would look like this:

for ( int iSample = 0; iSample < 1000; iSample++ )

{

   for ( int i = -val; i < val; i++ )

   {

	  pRes[ iSample + i ] += someValue * i;   (**)

   }

}

to make my life harder the line marked with (**) might also look like this:

pRes[ ( rand() % 1000 ) + i ] += someValue * i;

This is real production code, the main reason for the code being so “nice and user friendly”

is because the algorithm tries to do some sort of averaging.

Hey - I didnt write the algorithm… some mad scientist wrote it… ;)

thanks

eyal

This link is good for beginners… (I myself referred to it when I ventured onto parallel programming)
[url=“https://computing.llnl.gov/tutorials/parallel_comp/”]https://computing.llnl.gov/tutorials/parallel_comp/[/url]

This list would be useful to sticky.

I agree!

for ( int iSample = 0; iSample < 1000; iSample++ )

{

   for ( int i = -val; i < val; i++ )

   {

	  pRes[ iSample + i ] += someValue * i;   (**)

   }

}

You can always find parallelism in “2*val” interval… and then do a sliding window…

pRes[ ( rand() % 1000 ) + i ] += someValue * i;

Do the same here… Any result deviated by race condition can be explained as equivalent to another sequential algorithm that ran with a different random seed…

This pdf is not accessible and reachable for me. Could you please help me with the pdf attachment here.

Thanks

Probably this link

Hello,
Really thankful and grateful. Thank you very much friend.