Queries on Cuda threads sync

KapilMehta · March 20, 2016, 12:21pm

Dear All,

I am converting c++ code to Cuda code and i have some queries on the same

My C++ implementation is like this :

void func(//some arguments){

Func1(); // Func1 is called here

Some statements

Func2(); // Func2 is called here

Some statements

Func3(); //Func3 is called here

Some statements

Func4(); // Func4 is called here
}

Now to convert it into Cuda i have made Cuda kernels for Func1, Func2, Func3…I have following two queries:

1)Func4 is something which can not be run parallel by Cuda so i want it to run sequencially on GPU, so How do i call it on GPU…

Func2 kernel should be spawn only after Func1 Kernel output is computed. here i am creating multiple blocks and multiple threads inside it for Func1 kernel so do i need to sync all blocks before spawning kernel for Func2 ?(I came across syncthreads but it’s only to sync threads inside one block)

Any input is highly appreciated…

Robert_Crovella · March 20, 2016, 2:51pm

Sequential code usually doesn’t run very fast on the GPU, but its possible to call a sequential routine as a kernel call by launching a kernel of a single thread:

Func4<<<1,1>>>(...);

CUDA functions issued into the same stream will always execute in-order. If you don’t use any stream syntax at all, then all your functions are issued into the same default stream. Therefore this arrangement:

Func1<<<blocks1,threads1>>>(…);
Func2<<<blocks2,threads2>>>(…);

will guarantee that Func2 will not begin until Func1 has completed.

KapilMehta · March 21, 2016, 5:23am

ok thanks for your response…

My understanding is Kernel launch is asynchronous call and just after launching kernel control comes back to CPU…

From your comment :
Func1<<<blocks1,threads1>>>(…);
Func2<<<blocks2,threads2>>>(…);

will guarantee that Func2 will not begin until Func1 has completed.

Does this mean that at a time only one kernel can be running on GPU ??

How can i run simultaneously multiple kernels on GPU to increase load on GPU ?

Robert_Crovella · March 21, 2016, 11:50am

No, that is not what it means.

Perhaps you should read the programming guide section on asynchronous concurrent execution:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution

And study the CUDA concurrent kernels sample code:

http://docs.nvidia.com/cuda/cuda-samples/index.html#concurrent-kernels

Topic		Replies	Views
How to Launch Cuda kernel in different processes CUDA Programming and Performance	8	3616	November 6, 2018
Very quick question regard aync CUDA Programming and Performance	4	2707	June 25, 2008
Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa CUDA Programming and Performance	2	4169	August 25, 2011
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49381	January 20, 2011
CUDA 4.0 concurrent kernels CUDA Programming and Performance	6	1670	March 28, 2011
cuda kernels execution one by one - in sequential CUDA Programming and Performance	2	3392	January 27, 2012
A few new to CUDA questions CUDA Programming and Performance	3	1110	February 4, 2011
Using GPU and CPU at the same time CUDA Programming and Performance	5	6955	March 4, 2009
Multiple kernels in flight? CUDA Programming and Performance	19	26819	August 28, 2007
Multiple thread/process access to single GPU CUDA Programming and Performance	5	5957	May 13, 2008

Queries on Cuda threads sync

Related topics