The most important parallel algorithms to teach students

mebersole · April 10, 2013, 8:30pm

What do you consider the most important parallel algorithms a student needs to learn (regardless of language)?

A list for starters. Variations on these:

Reduction
Scan
Merge Sort

What else should be added and why?

CudaaduC · May 3, 2013, 6:39pm

Since GPU programming is relatively new, I think what most people want to see is side-by-side GPU and CPU implementations of popular algorithms.

Most papers that are published either;

Do not include any source code, rather just talk theoretically about the implementation in very high-level vague details

Example(This one is really a waste, no useful content at all other than ‘We implemented a dynamic programming algorithm and it was great!’);

[url]https://docs.google.com/viewer?url=http://www.ijcsmr.org/vol2issue4/paper325.pdf[/url]

It does include pseudo-code portions of the algorithm, but not enough to reconstruct the work in a real-world way.

Example;

[url]https://docs.google.com/viewer?url=http://www.danielbit.com/~dali/papers/Graph_GPU_Adaptive_IPDPS13.pdf[/url]

Or they do include the source which is spread out over multiple header/source files and then abuse templates to the point where not even a 10-year veteran can make sense of it all.

Example;

[url]Google Code Archive - Long-term storage for Google Code Project Hosting.

When I post CUDA code to Github I always include an easy to follow CPU implementation along side with the GPU implementation.

Less than 20% of the commonly found papers/blogs/articles are informative enough to use the content.

The bottom line is that GPU programming is tricky, and the best way to learn is to see clear examples of implementations of popular algorithms.

Nvidia spends at lot of energy designing good GPUs, but need to show new users how to use these features. Not everybody has the time to learn via trial-and-error.

Scans, reductions and merge sort have been discussed over and over.

How about clear examples of algorithms such as A*, optimized BFS, discrete knapsack(0/1) , bipartite matching, convex-hull, Hungarian assignment, computing eigen values, etc.

mpc · May 8, 2013, 1:23am

I will add Convolution is everywhere in DSP , another one Matrix Multiplication because is Matrix Multiplication.

I do agree with CudaaduC more traditional algorithms , greede, divide/conquer etc. In other words write a book in Analysis of Algorithms using real parallel architectures using CUDA

CudaaduC · May 10, 2013, 5:51am

It is also important to realize that many people who use CUDA, are going to be using Thrust, cuBLAS, cuSPARSE to solve problems.

Thrust’s vectors are slow, so there needs to be more examples of using the their device pointers(to device memory allocated in the traditional way) for scans(like max_element), sort etc. When you use their device pointers to device memory thrust::sort is fast as hell.

There should be clear examples of the tricky issues like making the adjustment to column-major format(this trips a lot of programmers up), and the difference sparse formats which are available and how they are stored (cuSPARSE).

Keep in mind most of those Linear Algebra function take in 9-14 parameters, and if you get one wrong horrible things can happen. Varied clear examples always help.

Also include matrix multiplication examples when rows!=columns, because too often square matrices are used and that does not help distinguish between the row-major to col-major adjustment.

Topic		Replies	Views
What is your ideal parallel programming book? Teaching & Curriculum Support	4	4301	May 8, 2013
Getting started with parallel programming Suggested reading CUDA Programming and Performance	6	29529	February 12, 2010
Modern GPU CUDA Programming and Performance	30	5957	April 11, 2016
ModernGPU: New fast sort and book-length online tutorial radix sort, sparse matrix, tons of docs CUDA Programming and Performance	7	2952	September 16, 2011
execution time on CPU and GPU CUDA Programming and Performance	6	1203	February 26, 2015
In Multi GPU implementations , any well known algorithms used in generic? CUDA Programming and Performance	2	357	October 5, 2021
CUDA 101: Get Ahead of the CUDA Curve with Practice! Technical Blog	0	325	August 25, 2020
How do I sort using CUDA? CUDA Programming and Performance	2	5191	July 9, 2019
Multi-GPU Programming with Standard Parallel C++, Part 2 Technical Blog	0	414	April 18, 2022
Good parallel sort/search algorithms wanted! :) Applicable on CUDA CUDA Programming and Performance	15	9237	October 24, 2008

The most important parallel algorithms to teach students

Related topics