cudaMemcpy DeviceToDevice and L2 cache usage

foteini · December 2, 2024, 1:18pm

Hello!

There are a couple of posts, claiming that when using cudaMemcpy for Host-Device transfer, it is done through the GPU L2 cache:

https://forums.developer.nvidia.com/t/cudamemcpy-and-l2-cache/42817

https://stackoverflow.com/questions/34005844/gpu-l2-cache-hit-is-100-and-dram-load-transactions-sometimes-is-0

I was wondering if that’s the case when cudaMemcpy is used in a DeviceToDevice model in the same GPU, i.e using cudaMemcpy to copy from array A to array B, with A and B residing in GPU memory.

Thank you for your help!

Topic		Replies	Views
Inter-device copying CUDA Programming and Performance	2	856	May 25, 2010
CudaMemcpyDeviceToDevice from one GPU to another CUDA Programming and Performance	2	8419	March 25, 2009
cudaMemcpyDeviceToDevice CUDA Programming and Performance	8	7017	November 13, 2020
Copying from GPU0 to GPU1 is there a way to do it without a host? CUDA Programming and Performance	1	2194	February 15, 2010
how to share data between two GPU? CUDA Programming and Performance	3	1833	July 11, 2009
Device to Host memcpy How do i make this faster? CUDA Programming and Performance	2	2514	February 6, 2008
Using dma memory transfers CUDA Programming and Performance	2	8113	February 23, 2007
memCpy : Device to Device VERY SLOW CUDA Programming and Performance	7	2824	September 13, 2009
cudaMemCpy() and Zero-Copy access CUDA Programming and Performance	4	1751	May 19, 2009
Copy static device variable/array to host : CUDA 4.0 CUDA Programming and Performance	0	798	May 15, 2013

cudaMemcpy DeviceToDevice and L2 cache usage

Related topics