I’ve got an question for the C (and CUDA) experts out there. I want to write two versions pf the same program: one to be run on the CPU and one on the GPU so that I can compare the two of them.
For the moment I’m working on the CPU program. Basically in the program I have to run a loop for hundreds of time. In the loop I calculate (the outer loop contains two inner loops):
u_new[i][j]=u_old[i][j]; (obviously this type of program should run faster on a GPU)
Now the problem is that during the next iteration I have to calculate new values for u_new based on the old values of u_new. I’ve tried three different versions:
- At the end of the outmost loop I simply copy u_new to u_old.
- I use dynamically allocated arrays and interchange the pointers (actually the pointers to the pointers because it’s a 2D array)
- I use an if condition and if it’s an even loop then the array u_new will contain the new values which are calculated based on the old ones and if it’s an odd loop then the array u_old will contain the new values which are calculated based on the old ones (which are in u_new).
It seems that the third version is the fastest but the differences are small (100ms at 7 seconds).
Does anyone know better solutions for this problem? And which of the versions do you think would work best on a GPU?
Thanks a lot!