Are different Streams sharing cache resources?


I’m wondering if launching N CUDA streams (literally N kernels in a streaming way) to overlap the computation and the memory copy between host and device to some extent, are different streams sharing resources? like L2 cache or so?

I know at any time, there can be only one kernel being processed for computation. But when Stream 1 is being computed, can Stream 2 kernel do the memory transfer through L2 which is already hold by Stream 1?


For the case of an overlapping computation and memory transfer, I’m not sure if this matters. I assumed (although if someone knows, please correct me!) that host<->device transfers don’t impact the L2 cache on the device.