DMA to global memory while a kernel is running ?

crroush · December 17, 2008, 4:25pm

Is it possible to continuously DMA data to global memory, while a kernel(s) are operating on previously sent data?

alex_dubinsky · December 17, 2008, 4:54pm

Someone else asked this. They didn’t report back after trying it, but the answer is maybe. In theory, it should work if you set up two streams and keep doing async memcpies on the 2nd one. In practice, the runtime might not be designed for this. What it is designed for is not a “continuous” workflow, but a pipelined one. You execute the kernel repeatedly, and using streams api you simultaneously memcpy data for the next kernel execution (not the current one).

Smokey · December 17, 2008, 10:18pm

What do you mean ‘maybe’ and ‘in theory’? This is a documented feature of CUDA, and is designed to works for the vast majority of CUDA capable cards currently release.

Refer to the Programming Guide section 4.5.1.5.

tmurray · December 17, 2008, 10:31pm

You two are answering different questions.

kernel writes buffer1, DMA buffer0 back to host – this works just fine

kernel writes buffer1, DMA buffer1 back to host as kernel 1 is writing – please don’t do that because it is a bad idea even if it seems like it might work

Smokey · December 17, 2008, 10:41pm

The OP said the kernel was operating on ‘previously sent’ (eg: past-tense = the operation has completed) data.

Of course, the programming guide never guarantees nor even mentions anything relating to the order of DMA transfers and how it relates to the execution of any concurrent kernels - thus the latter is not a documented feature.

Gimurk · December 18, 2008, 1:30am

i did some test about cuda stream just as crroush expects. there are 8 streams and two operations in one stream( 1) mem copying from host to device, 2) counting on data which are transfered by all the streams) in my test. the result shows that counting operations are done before mem transfering in some streams have been finished, namely, what counting got using streams is less than that without streams

Sarnath · December 18, 2008, 6:08am

The feature of DMAing while kernel execution is a hardware feature and available only on select hardware.

Check out the device properties of your hardware to see if it is enabled b4 testing.

Refer the programming guide to see which property holds that value.

Gimurk · December 19, 2008, 3:08am

i did the test last month and will retry it later, since i am a little busy now.

anyway, thanks for your advice.

Gimurk

Topic		Replies	Views
Continuing global memory output between kernels CUDA Programming and Performance	2	489	August 23, 2019
global-global memory transfers CUDA Programming and Performance	9	1797	December 1, 2014
Question about memory flush and synchronization memory flush and synchronization CUDA Programming and Performance	6	4710	July 23, 2008
Write Global Memory while kernel is running CUDA Programming and Performance	1	2021	April 16, 2009
Thread safety of reading and writing different area of constant memory in multiple concurrently executed kernels? CUDA Programming and Performance	10	969	March 27, 2023
Scheduling a kernel asynchronously from inside another kernel CUDA Programming and Performance	3	382	May 12, 2023
How to synchronize between two kernels using CUDA? CUDA Programming and Performance	2	29	November 23, 2024
CUDA and NPP Misc Issues CUDA Programming and Performance	6	1449	March 28, 2011
Using dma memory transfers CUDA Programming and Performance	2	8088	February 23, 2007
__shared__ memory: Just a question what happens if CUDA Programming and Performance	3	837	March 15, 2016

DMA to global memory while a kernel is running ?

Related topics