I have a setup of two V100 SXM blades, and one setup with two 980 Ti blades.
I am checking with cudaDeviceCanAccessPeer if peer access is supported and if it is use cudaDeviceEnablePeerAccess to enable it.
I than use cudaMemcpyPeerAsync to copy data between devices.
There are three types of device to device copies:
- PCI via host to PCI - slowest
- PCI via PCI switch to PCI (AKA RDMA)
- using NVLINK - fastest
My question is how can i be sure which copy method is used in both setups?
in case both RDMA and NVLINK is supported which one will be used?
I of course want the fastest method available to the HW.