What’s your favorite “Hello Parallelism” or “Hello World” code sample for introducing someone to GPU Computing?
I do the vector addition because is very simple to explain
@mpc, yes that’s a very common one. Aren’t you afraid it introduces TO many topics at once, though (memory spaces, memory movement, threads & blocks, and kernels)? Personally, I find this to be my favorite Hello Parallelism example:
#include <stdio.h>
__global__ void hello()
{
printf("Hello Parallelism from thread %d in block %d\n",
threadIdx.x, blockIdx.x);
}
int main()
{
hello<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
I start by showing them the one thread, one block version which yields one line of output. Then I move on to 1 block and 18 threads on my Fermi-based GPU. This will yield some randomization in the output, at which point I enforce the parallelism that’s going on. Then, if it’s a hands-on lab, I let the attendees play with the numbers.
Yes is intense because it needs some memory dancing but I keep like you the number of threads and number of blocks to 1,1 to save that fun for later.
If you write a complete C code for vector addition and next to it a complete code for Cuda vector addition is not too bad.
We won’t be call computer scientist if we didn’t do a Hello World but that takes around 10 min and immediately go into adding huge things .