I try to make a some 2D matrix to show a difference about GPU process is much faster than CPU process. However, I got CPU process is much faster than GPU. I used the following code for calculating the time duration.
double [host or device] = ((double)[END]-[START] / CLOCKS_PER_SECS);
I attached the image of result from my code.
PLEASE HELP ME TO UNDERSTAND WHY and WHAT IS THE REASON.
This is original GPU matrix Multiply section.
num is N from N x N square matrix.
dim3 blocks(num, num); dim3 grids((1+num)/num, (1+num)/num); gpustart1 = clock(); gpu_original_matrix<<<grids, blocks>>>(dev_matrixA, dev_matrixB, dev_result1, num); cudaDeviceSynchronize(); gpuend1 = clock(); . . . double gpums1 = (double)gpuend3 - gpustart3 / CLOCKS_PER_SEC); cout << "GPU TIME DURATION(Original) = " << gpums1 << endl;