Asynchronous kernel calls

Julek · October 21, 2009, 11:14am

I run code CUDA from http://www.codeproject.com/KB/graphics/GPUNN.aspx.

The kernels are called here:

dim3 Layer1_Block(6,1);

	dim3 Layer1_Thread(13,13);

	executeFirstLayer<<<Layer1_Block,Layer1_Thread>>>(Layer1_Neurons_GPU,Layer1_Weights_GPU,Layer2_Neurons_GPU);

	dim3 Layer2_Block(50,1);

	dim3 Layer2_Thread(5,5);

	executeSecondLayer<<<Layer2_Block,Layer2_Thread>>>(Layer2_Neurons_GPU, Layer2_Weights_GPU,Layer3_Neurons_GPU);

	dim3 Layer3_Block(100,1);

	dim3 Layer3_Thread(1,1);

	executeThirdLayer<<<Layer3_Block,Layer3_Thread>>>(Layer3_Neurons_GPU, Layer3_Weights_GPU,Layer4_Neurons_GPU);

	dim3 Layer4_Block(10,1);

	dim3 Layer4_Thread(1,1);

	executeFourthLayer<<<Layer4_Block,Layer4_Thread>>>(Layer4_Neurons_GPU,Layer4_Weights_GPU,Layer5_Neurons_GPU);

As you see, results from the N-th kernel call are used in n+1-th kernel call. There is no “cudaThreadSynchronize” call between the kernel calls, but everything always works correctly.

Why? Small kernel calls are synchronous? Or something else?

avidday · October 21, 2009, 11:30am

All kernel calls are synchronous with respect to the GPU. What happens in that code snippet is that the first kernel is launched, and then second and third are queued by the driver. The CPU is free to run asynchronously, but the GPU only runs a single kernel at any given time.

eyalhir74 · October 21, 2009, 11:32am

The kernel launch implicitly calls cudaThreadSynchronize so that a kernel will not start till the previous one has ended.

eyal

avidday · October 21, 2009, 11:42am

It does no such thing. That would imply the host threading owning the context sits in a spinlock until the kernel call finishes, which doesn’t happen. The driver maintains a queue, the kernel launch is queued and the host thread is released to run asynchronously. There is evidence that if the driver queue fills, the host thread will be held until a slot on the queue becomes free, but it seems you need to have queued a lot kernel launches (it might be as many as 64 in Cuda 2.3) before that happens.

Julek · October 21, 2009, 1:38pm

Thanks, now I understand!

Topic		Replies	Views
Synchronization between Kernel calls CUDA Programming and Performance	2	2756	July 4, 2011
Weird behavior in kernel calls. Related to asynchronous & synchronous instructions CUDA Programming and Performance	0	508	August 2, 2011
Kernel execution CUDA Programming and Performance	2	910	September 28, 2009
Using GPU and CPU at the same time CUDA Programming and Performance	5	6982	March 4, 2009
cudaThreadSynchronize usage CUDA Programming and Performance	3	2935	October 21, 2008
problem about cudaThreadSynchronize() CUDA Programming and Performance	3	7852	November 25, 2007
Kernel Timing and cudaThreadSynchronize() CUDA Programming and Performance	6	2025	July 30, 2010
A question about kernel execution CUDA Programming and Performance	1	2633	August 24, 2009
launch another kernel from a kernel on the GPU itself CUDA Programming and Performance	3	3968	June 6, 2009
when should cudaThreadSynchronize() be called? CUDA Programming and Performance	5	3284	October 22, 2010

Asynchronous kernel calls

Related topics