Concurrency of Global Memory Operations

Nikratio · February 17, 2011, 3:21pm

Hello,

When there are two warps that both request a 128 byte global memory access directly after each other in a 2.0 GPU, will the requests be executed in parallel (so the total time is 400 - 800 clock cycles), or sequentially (so the total time is 800 - 1600 clock cycles)? I.e., can global memory requests of different warps overlap?

Thanks,
Nikolaus

tera · February 17, 2011, 3:42pm

Yes, they overlap (up to a certain limit of outstanding transaction, and barring special cases like read-after-write hazards).

Topic		Replies	Views
Parallel Access to GDU Global Memory CUDA Programming and Performance	9	9001	January 24, 2008
Question on Global memory access CUDA Programming and Performance	0	836	June 25, 2010
warp scheduling CUDA Programming and Performance	5	2738	August 7, 2009
hiding global memory access do I need 2 warps? CUDA Programming and Performance	1	982	January 22, 2010
Concurrent writes to global memory CUDA Programming and Performance	1	7676	July 21, 2010
Half-warp divergence and global memory access! Speedups CUDA Programming and Performance	3	4074	December 17, 2007
Some CUDA/GPU implementation related questions CUDA Programming and Performance	6	2326	May 30, 2009
Global Memoy latencies and NVIDIA cards Latency CUDA Programming and Performance	15	8945	January 11, 2008
Global Memory Fetches How to arrange them in code for best performance CUDA Programming and Performance	6	1261	June 2, 2010
Atomic operations across a warp in parallel for CC2.0 devices CUDA Programming and Performance	0	2643	June 29, 2010

Concurrency of Global Memory Operations

Related topics