I try to make a some 2D matrix to show a difference about GPU process is much faster than CPU process. However, I got CPU process is much faster than GPU. I used the following code for calculating the time duration.
double [host or device] = ((double)[END]-[START] / CLOCKS_PER_SECS);
I attached the image of result from my code.
PLEASE HELP ME TO UNDERSTAND WHY and WHAT IS THE REASON.
This is original GPU matrix Multiply section.
num is N from N x N square matrix.
dim3 blocks(num, num);
dim3 grids((1+num)/num, (1+num)/num);
gpustart1 = clock();
gpu_original_matrix<<<grids, blocks>>>(dev_matrixA, dev_matrixB, dev_result1, num);
cudaDeviceSynchronize();
gpuend1 = clock();
.
.
.
double gpums1 = (double)gpuend3 - gpustart3 / CLOCKS_PER_SEC);
cout << "GPU TIME DURATION(Original) = " << gpums1 << endl;