Best Starting Example understanding CUDA

I have a serial Jacobi Method that I would like to make parallel. I have been looking at matriMul to understand how to make my source code. I have a little to moderate experience level with programming and I was wondering which is the best example program to pick apart to start formulating my CUDA based Jacobi?

Off the topic of GPU’s but I have a finite difference code that makes a matrix and then I have a jacobi code to solve the matrix. I currently execute them seperately with C. How do I combine the files as one execution? I want to be able to have these codes execute together and the loop back to the finite difference and increase the number of steps until convergence.


i think matrix multiplication with shared memory is the best way for a beginner.