CUDA set-up time

Dear all,

Good day,

Am new on GPU, I did try small program which adds two 256 x 256 matrices. The results was correct; however, the time using CUDA FORTRAN was much larger than the serial version. I was wondering what is the time required to set-up each kernal? I was thinking that the amount of computing not much to be paralleled using GPU.

Also, do u recommend a book or article talking about behind the scene of GPU?

P.S.: I used 256 blocks each block has 256 threads (i.e. <<<256,256>>>, also I tried to use DIM3 with various but it all yield slower than serial.

Many thanks

How did you initialize the matrices and how do you measure the time?
CUDA programming guide and CUDA by example. The time to call a kernel 1/1000000 s.

Thanks Pasoleatis,

I didn’t initialize them, for the time I did use CALL SYSTEM_CLOCK([COUNT, COUNT_RATE, COUNT_MAX]) function for the serial before and after the loop and for the CUDA before and after the kernal call. So the time to set-up kernal (including threads) is 1/1000000 s?

In CUDA C when you have a line calling a cuda function kernel<<<blocks,threads>>>(…) it takes 1 microsecond. I do not know the CUDA Fortran functions for measuring the time, but in CUDA C you have to use some specific functions. I suggest you put the code here since it is just simple.

The sum of 2 arrays 1 time is not really enough for speed-up tests. It is good for programming, but the copying of data from cpu to gpu would be comparable to the the execution time.