Sequential call of kernels

Hi,

Another simple question:

In sequential call of different kernels (Grid dim and block dim are different), I observed there is serious slow-down.

For example,

Kernel 1 only => 0.001 sec
Kernel 2 only => 0.130 sec

But kernel1 + kernel2 would take 0.450 sec.

Can I have some advice on this?

Thanks.

You probably make a mistake when timing your kernels. Kernel calls are asynchronous so they return immediately after you called them. Use cudaThreadSynchronize() before starting and before stopping the timer to get accurate results.

You should do it in following way:

cudaThreadSyncronize();

t1 = clock();

kernel1<<>>();

cudaThreadSYncronize();

t2=clock();

kernel2<<>>();

cudaThreadSyncronize();

t3=clock();

Replace clock() with appropriate timing function (such as QueryPerformanceCounter() on WIndows). kernel1 execution time will be (t2-t1) and kernel2 execution time will be (t3-t2).