Section 22.214.171.124 of the CUDA Programming Manual (1.0) says in relation to device memory read latency:
Does anyone know
How much is “much”?
How many memory access be in progress whilst other independent instructions are executing? More than one I presume!
Does this also apply to memory writes?