Write Global Memory while kernel is running

enemyben88 · April 15, 2009, 11:20pm

Hey, I had a random question that I am pretty sure the answer to but haven’t found EXACTLY the definitive answer. My question is this:

Is it possible for me to have the CPU muck around with global memory WHILE the kernel is running? That is, lets say I have some kernel threads that do some reads and writes to the global memory. Is it possible for the CPU to do reads and writes to the same memory locations at the same time (well not exactly the same time, but while the kernel is still executing). What exactly does the CPU do when it launches a kernel? Is the CPU serial execution basically blocked until the kernel is done?

Thanks for the help!

seibert · April 16, 2009, 1:09am

Kernel launch is asynchronous, which means that the CUDA runtime library queues up your kernel for execution with the driver, then returns immediately so that your program can keep running while the GPU does its thing. There is a maximum queue depth (that appears to vary between devices), so if you queue up ~24 kernel launches in a row, your program will block until a kernel has finished running.

Even though kernel launch is asynchronous, some operations will force your process to block until the kernel is complete. The most common of these is a cudaMemcpy(), which will automatically wait until all previous kernel launches are complete before running to ensure you do not read or write to device memory while it is being used. (As that is usually the behavior you want.) cudaMemcpy() is synchronous and only returns after the memory copy has completed.

CUDA supports the concept of “streams”, where a stream is a sequence of GPU operations that execute in order. If you create two streams and run kernel launches and CUDA memory copies (using cudaMemcpyAsync to avoid the blocking behavior) on different streams, then kernels can execute at the same time as memory copies, assuming your device and operating system support that. Modifying the same device memory locations being accessed by a kernel is possible, I believe, but almost certainly a bad idea. Overlapping kernel execution and memory copies is usually done with some kind of double-buffering scheme.

This is explained in more detail in the programming guide in section 3.2.6, “Asynchronous Concurrent Execution.”

Topic		Replies	Views
Very quick question regard aync CUDA Programming and Performance	4	2707	June 25, 2008
Concurrent Kernel Execution CUDA Programming and Performance	2	4526	June 10, 2011
Continuing global memory output between kernels CUDA Programming and Performance	2	489	August 23, 2019
Concurrent kernel execution CUDA Programming and Performance	2	257	March 26, 2024
Thread safety of reading and writing different area of constant memory in multiple concurrently executed kernels? CUDA Programming and Performance	10	969	March 27, 2023
DMA to global memory while a kernel is running ? CUDA Programming and Performance	7	2073	December 19, 2008
Accesing memory from both kernel and host side CUDA Programming and Performance	1	3029	February 17, 2008
Running a kernel blocks the CPU? Is it possible to run it asynchronously? CUDA Programming and Performance	2	3485	April 21, 2009
Infinite loop in CUDA kernel CUDA Programming and Performance	11	15931	October 25, 2010
Overlapping kernel execution and memory copy CUDA Programming and Performance	6	9715	September 22, 2007

Write Global Memory while kernel is running

Related topics