Queries on Cuda threads sync

Dear All,

I am converting c++ code to Cuda code and i have some queries on the same

  1. My C++ implementation is like this :

void func(//some arguments){

Func1(); // Func1 is called here

Some statements

Func2(); // Func2 is called here

Some statements

Func3(); //Func3 is called here

Some statements

Func4(); // Func4 is called here
}

Now to convert it into Cuda i have made Cuda kernels for Func1, Func2, Func3…I have following two queries:

1)Func4 is something which can not be run parallel by Cuda so i want it to run sequencially on GPU, so How do i call it on GPU…

  1. Func2 kernel should be spawn only after Func1 Kernel output is computed. here i am creating multiple blocks and multiple threads inside it for Func1 kernel so do i need to sync all blocks before spawning kernel for Func2 ?(I came across syncthreads but it’s only to sync threads inside one block)

Any input is highly appreciated…

Sequential code usually doesn’t run very fast on the GPU, but its possible to call a sequential routine as a kernel call by launching a kernel of a single thread:

Func4<<<1,1>>>(...);

CUDA functions issued into the same stream will always execute in-order. If you don’t use any stream syntax at all, then all your functions are issued into the same default stream. Therefore this arrangement:

Func1<<<blocks1,threads1>>>(…);
Func2<<<blocks2,threads2>>>(…);

will guarantee that Func2 will not begin until Func1 has completed.

ok thanks for your response…

My understanding is Kernel launch is asynchronous call and just after launching kernel control comes back to CPU…

From your comment :
Func1<<<blocks1,threads1>>>(…);
Func2<<<blocks2,threads2>>>(…);

will guarantee that Func2 will not begin until Func1 has completed.

Does this mean that at a time only one kernel can be running on GPU ??

How can i run simultaneously multiple kernels on GPU to increase load on GPU ?

No, that is not what it means.

Perhaps you should read the programming guide section on asynchronous concurrent execution:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution

And study the CUDA concurrent kernels sample code:

http://docs.nvidia.com/cuda/cuda-samples/index.html#concurrent-kernels