If I have two threads in the same warp read device memory with simultaneous INcoherent reads, is the latency added together? Or is it just a one clock delay for the second read to register too.
For example, if two threads had a COHERENT read, they’d both wait for a long latency of (say) 200 clocks, and they’d get their read memory simultaneously.
If the two threads request two INCOHERENT memory reads, there’s a long latency of 200 clocks for the memory to arrive. So the first read is ready in 200 clocks. Is the second read ready in 201 clocks, or 400?