Hello Parallelism!

What’s your favorite “Hello Parallelism” or “Hello World” code sample for introducing someone to GPU Computing?

I do the vector addition because is very simple to explain

@mpc, yes that’s a very common one. Aren’t you afraid it introduces TO many topics at once, though (memory spaces, memory movement, threads & blocks, and kernels)? Personally, I find this to be my favorite Hello Parallelism example:

#include <stdio.h>

__global__ void hello()
{
    printf("Hello Parallelism from thread %d in block %d\n", 
            threadIdx.x, blockIdx.x);
}

int main()
{
    hello<<<1,1>>>();
    cudaDeviceSynchronize();

    return 0;
}

I start by showing them the one thread, one block version which yields one line of output. Then I move on to 1 block and 18 threads on my Fermi-based GPU. This will yield some randomization in the output, at which point I enforce the parallelism that’s going on. Then, if it’s a hands-on lab, I let the attendees play with the numbers.

Yes is intense because it needs some memory dancing but I keep like you the number of threads and number of blocks to 1,1 to save that fun for later.
If you write a complete C code for vector addition and next to it a complete code for Cuda vector addition is not too bad.

We won’t be call computer scientist if we didn’t do a Hello World but that takes around 10 min and immediately go into adding huge things .