Hi,
I happened to test such kind of program in Matlab.
tic; fft2(img_d); toc % img_d is a gpuArray on GPU
.......
%Set the block and grid sizes of my kernel
........
% invoke my kernel
tic
img_d=feval(myKernelName, .......);
toc
% do fft2 on gpu again
tic; fft2(img_d); toc % img_d is a gpuArray on GPU
The result is stange:
the first fft2 on gpu costs:
Elapsed time is 0.000557 seconds.
but after invoke my kernel ,the second fft2 on gpu costs:
Elapsed time is 0.074028 seconds.
If I add such line after invoking line:
img = gather(img_d)
and then measure the time for the second fft2 on gpu
the time looks right:
Elapsed time is 0.000392 seconds.
What is wrong with my kernel?