performance global memory reads for different block

Hi im new to cuda. IF I have two different kernals that each have their own streams reading some of the same data will the memory speed drop significantly or not. Is the drop going to be as bad as coalesced meory reads and uncoalesced reads.

Streams are used in conjunction with host-device communication, ie. pushing data through PCIe between CPU’s memory and GPU’s memory. Coalesced and uncoalesced memory accesses refer to device-device communication, ie. GPU accessing its own RAM. So, streams have nothing to do with coalescing, they allow you to hide the device-host transfers under kernel execution.

You can’t have two kernels physically running at the same time, they will get automatically serialized and implicitly synced (even if you use streams and whatnot) so there’s no issue of two kernels trying to access the same spot in GPU’s memory at the same moment.