Concurrency of Global Memory Operations


When there are two warps that both request a 128 byte global memory access directly after each other in a 2.0 GPU, will the requests be executed in parallel (so the total time is 400 - 800 clock cycles), or sequentially (so the total time is 800 - 1600 clock cycles)? I.e., can global memory requests of different warps overlap?


Yes, they overlap (up to a certain limit of outstanding transaction, and barring special cases like read-after-write hazards).