Invoking my kernel in Mat affects fft2 speed on GPU? What is wrong in my kernel

Hi,

I happened to test such kind of program in Matlab.

tic; fft2(img_d); toc   % img_d is a gpuArray on GPU

.......

%Set the block and grid sizes of my kernel

 ........

% invoke my kernel

tic

img_d=feval(myKernelName, .......);

toc

% do fft2 on gpu again

tic; fft2(img_d); toc   % img_d is a gpuArray on GPU

The result is stange:

the first fft2 on gpu costs:

Elapsed time is 0.000557 seconds.

but after invoke my kernel ,the second fft2 on gpu costs:

Elapsed time is 0.074028 seconds.

If I add such line after invoking line:

img = gather(img_d)

and then measure the time for the second fft2 on gpu

the time looks right:

Elapsed time is 0.000392 seconds.

What is wrong with my kernel?

You are using MathWorks’ software, so you should ask them. Many people have reported similar issues before, so you are not alone.

Thanks a lot.

No problem. Saw your post here and hope you get answers. If you ever want something better, you know where to find me!