Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa

sesh · August 25, 2011, 12:45pm

Hi,

I am working on a multi-threaded application where the whole memory pipeline is set up on the GPU as follows.

Process1 → Process2 → Process3 → Process4 and so on…

There are multiple processes running in parallel and each CPU thread is associated with each process. All the CPU threads share the same cudaContext as there is GPU memory that needs to be shared between two processes for instance the output of the Process1 is input to Process2 and so I just pass output memory pointer of Process1 to Process2. I am ensuring that Process1 writes to the buffer before Process2 reads from the Buffer using CPU semaphores. Also each process has multiple kernel executions say Process1 has kernel_11, kernel_12, kernel_13 etc. There is also dependence between the memories used by different kernels like the output of kernel_11 is input to kernel_12.

Process1
{
kernel_11(writes to memory1);
kernel_12(reads from memory1 and writes to memory2);
kernel_13(reads from memory2 and writes to memory3);
…
cudaThreadSynchronize();
}

Similarly for other processes.

I am not using any streams at the moment so I believe all the kernel are associated to a default stream. The following are my queries

Do I need to have a cudaThreadSynchronize between each kernel launch of the same process i.e. should there be a kernel_11 followed by cudaThreadSynchronize followed by kernel_12 or does the driver internally take care that the threads are executed in order.(The manual says that if it is default stream the execution is in-order but my doubt is do I need to ensure memory1 is written by kernel_11 “expliclitly” before kernel_12 starts reading it). The number of GPU threads of kernel_11 may be different from number of threads of kernel_12.
Since I am ensuring multiple CPU threads are using Semaphores for Synchronization, can I just launch the kernel is the Second GPU without any issue because the Process2 waits until the write by Process1 (since I am just passing pointers, also the memory is only on GPU, I dont need to transfer it to CPU) is complete.
Will the usage of Streams by different Processes give any speedup?

I am asking the first two questions since I am getting unexpected results but I want to be sure that this is not because CUDA expects me to do something that I am not doing.

Thank You.
Sesh

PS: Graphics Card I am using is GTX 580

JeremiahPalmer · August 25, 2011, 3:16pm

How do you have multiple CPU threads sharing the same cudaContext? I was under the impression that a context links one host to one GPU.

seibert · August 25, 2011, 3:32pm

This restriction was removed in CUDA 4.0. You can now share a single context with multiple host threads, and manage multiple contexts from a single host thread.

Topic		Replies	Views
How to Launch Cuda kernel in different processes CUDA Programming and Performance	8	3725	November 6, 2018
Concurrent kernel execution CUDA Programming and Performance	2	327	March 26, 2024
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49400	January 20, 2011
Asynchronous HtoD memtransfer need to have it asynchronous for cpu, but synchronous for the GPU CUDA Programming and Performance	6	1013	September 9, 2010
My streams are not running concurrently CUDA Programming and Performance	7	1775	March 6, 2018
async memcopy/kernel from different contexts overlaping operations from different contexts.. CUDA Programming and Performance	9	2949	December 18, 2008
CUDA 4.0 concurrent kernels CUDA Programming and Performance	6	1670	March 28, 2011
cuda stream CUDA Programming and Performance	3	5801	April 6, 2011
Simultaneous execution of multiple kernels CUDA Programming and Performance	4	2602	December 24, 2008
Threaded CUDA Multiple concurrent kernels? CUDA Programming and Performance	9	5594	October 20, 2009

Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa

Related topics