NVIDIA Developer Forums

NVLink and Cache Levels

Accelerated Computing CUDA CUDA Programming and Performance

user98644 July 19, 2023, 9:31pm 1

On Ampere, if two GPUs talk to each other in a long running kernel (just 1 thread block is launched) through cudaIPC

Now suppose GPU B will be reading from the buffer of GPU A shared via cudaIPC:

is it correct that this traffic goes through NVLink?
will this read be loading data from GPU A’s L1 cache, L2 cache, or global memory?
wrt threadfence, does its enforcement take effect for this type of read as well? just like GPU B reading from its own memory?

Topic		Replies	Views	Activity
Can Unified Memory Migration use NVLink? CUDA Programming and Performance	2	714	October 12, 2021
Performance problems with NVLink and L2 cache CUDA Programming and Performance	6	995	September 26, 2022
Is __threadfence() strong enough to gurantee that CUDA IPC via NVLink accesses the latest data? CUDA Programming and Performance	0	393	June 13, 2023
Hardware coherence over NVLink CUDA Programming and Performance	3	3028	May 1, 2023
Can cudaMemcpy() use both pci and nvlink? General Topics and Other SDKs	0	196	February 5, 2024
Can I make a NVLinked 2x RTX 2080Tis as 1x big GPUs? CUDA Setup and Installation	1	470	May 16, 2019
P2P GPU Direct Communication CUDA Programming and Performance	1	943	February 1, 2024
multiple gpu and unified memory CUDA Programming and Performance	3	4484	March 29, 2022
How does “cudaMemcpyPeer” implement? CUDA Programming and Performance	3	1299	February 6, 2024
about the nvlink between two gpus CUDA Programming and Performance	4	1071	April 3, 2019