device memory reads

If I have two threads in the same warp read device memory with simultaneous INcoherent reads, is the latency added together? Or is it just a one clock delay for the second read to register too.

For example, if two threads had a COHERENT read, they’d both wait for a long latency of (say) 200 clocks, and they’d get their read memory simultaneously.

If the two threads request two INCOHERENT memory reads, there’s a long latency of 200 clocks for the memory to arrive. So the first read is ready in 200 clocks. Is the second read ready in 201 clocks, or 400?

My understanding is that two uncoalesced reads in one warp will finish in a very slightly longer time than one.