Hiding memory read latency

Hi,

Section 5.1.1.3 of the CUDA Programming Manual (1.0) says in relation to device memory read latency:

Does anyone know

  • How much is “much”?

  • How many memory access be in progress whilst other independent instructions are executing? More than one I presume!

  • Does this also apply to memory writes?

Thanks,

Alex