Data transfer between GPU of a workstation

Monkey.py · April 16, 2024, 3:25pm

Hi, to do some Machine Learning stuff, I am interested into building a workstation with 2x 3090. I know that RTX 3090 is compatible with NVLink.

Can someone explain how is the data (tensor) transfered using cuda from one gpu to another ? Which hardware parts are involved in the transfer
Using 3090 is it mandatory to use NVLink to do that ?
As 4090 is not compatible with NVLink, is it possible to transfer a tensor from a gpu to another ? Do I need special hardware connection or is the motherboard connection sufficient ?

Thank you !

Robert_Crovella · April 16, 2024, 3:50pm

In the setting you are describing, it is the GPUs and the NVLink “bridge”. The GPUs have a high-speed NVLink port on them, and the bridge basically acts like a bunch of copper wires connecting the port on one GPU to the port on another GPU.

I think it might be. It was mandatory in the RTX 20 generation.

It is certainly possible. The path/dataflow will be different. There is no other hardware to consider, the only other connection the GPU has is PCIE (i.e. the motherboard connection.)

Detailed questions about how to move a tensor from one GPU to another in the context of ML/DL should probably be asked in a relevant forum, such as discuss.pytorch.org.

At the CUDA level, the CUDA APIs typically involved are cudaDeviceCanAccessPeer and cudaMemcpyPeerAsync. These APIs are typically unknown to ML/DL workers, who are typically using higher level constructs, e.g. in pytorch.

In CUDA, even if you attempt to set up a peer connection (which would be e.g. necessary if you wanted to use NVLink) and for some reason the hardware platform does not support it (multiple possible reasons exist), the data transfer requested (e.g. cudaMemcpyPeerAsync`) will still take place, it will just use a “slower” path. (Notwithstanding this, I always recommend use of proper CUDA error checking on every CUDA API call.)

My response here is not intended to be an exhaustive tutorial on peer access, but there are numerous forum questions, CUDA sample codes, and relevant API sections.

rs277 · April 16, 2024, 7:13pm

There’s a slightly dated, but 3090 relevant article here that may be of interest.

Topic		Replies	Views
How to communicate beetween two GPUs Tesla D870 : two tesla C870 GPUs CUDA Programming and Performance	2	1622	April 10, 2008
about the nvlink between two gpus CUDA Programming and Performance	4	1187	April 3, 2019
NVLINK CUDA Programming and Performance	3	2075	May 5, 2018
NvLink (V100) GPU - Hardware	4	2030	October 12, 2021
how to best transfer memory between GPUs sitting on different PCI controllers CUDA Programming and Performance	0	1871	February 20, 2012
Data transfer between two GPUs CUDA Programming and Performance	6	2802	September 9, 2009
GPU to GPU transfers most effective method? CUDA Programming and Performance	27	38260	March 3, 2011
Making work NVLink on Windows 10 with dual 3090 CUDA Developer Tools	0	2519	April 13, 2021
Data transfer between CPU and GPU CUDA Programming and Performance	7	14340	January 30, 2012
Real-time GPU processing Peer 2 peer data copy, Linux kernel memory, kernels in kernel, CUDA Programming and Performance	35	8187	June 30, 2010

Data transfer between GPU of a workstation

Related topics