Hello!
There are a couple of posts, claiming that when using cudaMemcpy for Host-Device transfer, it is done through the GPU L2 cache:
https://forums.developer.nvidia.com/t/cudamemcpy-and-l2-cache/42817
I was wondering if that’s the case when cudaMemcpy is used in a DeviceToDevice model in the same GPU, i.e using cudaMemcpy to copy from array A to array B, with A and B residing in GPU memory.
Thank you for your help!