What is your ideal parallel programming book?

There are quite a few parallel programming books in existence, both old and new. However, I’m wondering what your ideal parallel programming book would be, either for use in a classroom, or for self-paced learning.

Some examples of things that would be covered:

  • What parallel algorithms would you want covered? Numerical analysis methods covered? What languages/methods would you wanted presented? What techniques? Etc.

Or does your ideal parallel programming book already exist? If so, what is it and why?

Just a few ideas:

  1. The ideal book should start with “Hello World” program. It is so satisfying to be able to successfully run something and then start to add some meaningful code instead of reading a few chapters of dry explanations of GPU/CPU architecture. This must come much later.
  2. Reduce, Scan and Sort algorithms should be accompanied by well-known applications of these algorithms.
  3. Some books/courses at the end go very quickly through the topic of multicore CPU/GPU optimal usage, but I found very few real examples of optimal MPI/CUDA code.


Definitely agree on point #1! To often I see books or presenters start diving right into the guts of how a GPU works (warps and shared memory and memory coalescing, oh my!) and scares/confuses to many people. You wouldn’t teach someone how to program a CPU in CS101 by immediately showing them how the three levels of cache hierarchy works. You start simple and then move on!

Great idea on point #2. Any suggestions?

Agreed on examples needed for “hybrid” computing. Especially once you start getting into multi-CPU/multi-GPU AND multi-node systems. I’ll start trying to collect these types of examples and make them available for anyone to use.

Thanks for the reply!

Here are some things I’d like to see in a parallel algorithms book. I’m particularly focussed on a book covering fundamental algorithms much more than CUDA implementations.

Fundamental ideas

Parallel threads

Race conditions

Atomic operations

Sequential versus parallel complexity

Binary tree / divide-and-conquer algorithms



Recurrence equation

Cyclic reduction for tridiagonal equations


Butterfly network

Other applications?

Wavefront parallelisation

Gauss-Seidel iteration

ILU preconditioning

Multi-frontal direct sparse solver

Task parallelism

Task DAGs and their parallel execution

Relevance to dense linear algebra

N-body problems

Construction and updating of oct-tree structures

Parallel implementation of multipole methods

Partitioning, renumbering and coloring

Parmetis, PT-Scotch and similar partitioning algorithms

Parallel partition refinement and load balancing

Relevance to renumbering for improved locality

Coloring to avoid race conditions

Sequential greedy coloring

Parallel randomised coloring

Sorting algorithms

Radix sort

Merge sort

SkipList insert sort

For a general Intro to Parallel Programming class I use these book:
An Introduction to Parallel Programming, Peter Pacheco (Author)

The problem is that there is no GPU computing at all. So for the GPU computing part I use:
The CUDA tutorial
The CUDA best programming practices
Programming Massively Parallel Processors, Second Edition.David B. Kirk (Author), Wen-mei W. Hwu (Author)

So no one solution for GPU. I think the book by Kirk and Hwu is great but is not for an introduction, the chapters in that book for case studies (chapter 10-13) are a bit useless in class. So if you can create a book like Programming Massively Parallel Processors but with 4 chapter really in introduction to CUDA C/C++ and get read of the cases it would be really helpful.

In a more advance class maybe a Parallel Algorithms using CUDA book would also be really useful, the email above by mgiles kind of request a book like this.