Data transfer between GPU of a workstation

Hi, to do some Machine Learning stuff, I am interested into building a workstation with 2x 3090. I know that RTX 3090 is compatible with NVLink.

  • Can someone explain how is the data (tensor) transfered using cuda from one gpu to another ? Which hardware parts are involved in the transfer
  • Using 3090 is it mandatory to use NVLink to do that ?
  • As 4090 is not compatible with NVLink, is it possible to transfer a tensor from a gpu to another ? Do I need special hardware connection or is the motherboard connection sufficient ?

Thank you !

In the setting you are describing, it is the GPUs and the NVLink “bridge”. The GPUs have a high-speed NVLink port on them, and the bridge basically acts like a bunch of copper wires connecting the port on one GPU to the port on another GPU.

I think it might be. It was mandatory in the RTX 20 generation.

It is certainly possible. The path/dataflow will be different. There is no other hardware to consider, the only other connection the GPU has is PCIE (i.e. the motherboard connection.)

Detailed questions about how to move a tensor from one GPU to another in the context of ML/DL should probably be asked in a relevant forum, such as discuss.pytorch.org.

At the CUDA level, the CUDA APIs typically involved are cudaDeviceCanAccessPeer and cudaMemcpyPeerAsync. These APIs are typically unknown to ML/DL workers, who are typically using higher level constructs, e.g. in pytorch.

In CUDA, even if you attempt to set up a peer connection (which would be e.g. necessary if you wanted to use NVLink) and for some reason the hardware platform does not support it (multiple possible reasons exist), the data transfer requested (e.g. cudaMemcpyPeerAsync`) will still take place, it will just use a “slower” path. (Notwithstanding this, I always recommend use of proper CUDA error checking on every CUDA API call.)

My response here is not intended to be an exhaustive tutorial on peer access, but there are numerous forum questions, CUDA sample codes, and relevant API sections.

There’s a slightly dated, but 3090 relevant article here that may be of interest.