GPU is slower than CPU process from nvprof

I try to make a some 2D matrix to show a difference about GPU process is much faster than CPU process. However, I got CPU process is much faster than GPU. I used the following code for calculating the time duration.

double [host or device] = ((double)[END]-[START] / CLOCKS_PER_SECS);

I attached the image of result from my code.
PLEASE HELP ME TO UNDERSTAND WHY and WHAT IS THE REASON.

This is original GPU matrix Multiply section.
num is N from N x N square matrix.

dim3 blocks(num, num);
	dim3 grids((1+num)/num, (1+num)/num);

	gpustart1 = clock();
	gpu_original_matrix<<<grids, blocks>>>(dev_matrixA, dev_matrixB, dev_result1, num);
	cudaDeviceSynchronize();
	gpuend1 = clock();
        .
        .
        .
        double gpums1 = (double)gpuend3 - gpustart3 / CLOCKS_PER_SEC);
	cout << "GPU TIME DURATION(Original) = " << gpums1 << endl;