I read through the posts here, and the info here: https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
It is still difficult to figure out which GPUs enable GPUDirect RDMA, and which don’t.
For example, in https://docs.nvidia.com/cuda/gpudirect-rdma/index.html it says:
“GPUDirect RDMA is available on both Tesla and Quadro GPUs.”
Here, is Tesla referring to microarchitecture or the brand of GPUs like Tesla P100/V100?
In nv-p2p.h file:
NVIDIA_P2P_ARCHITECTURE_TESLA = 0,
NVIDIA_P2P_ARCHITECTURE_CURRENT = NVIDIA_P2P_ARCHITECTURE_FERMI
This seems to indicate Tesla as the microarchitecture. In that case, what Tesla GPUs are supported?
Robert Crovella may be able to point to better, more meaningful, language.
Note that the designated sales channel for Tesla-brand products is system integrators, not sales directly to end users. These GPUs are intended to be sold and supported as part of a system. System vendors on NVIDIA’s partner list that sell system with integrated Tesla GPUs should be able to tell you what features are supported by their systems. You are probably aware that you need an RDMA-capable counter part to the GPU (such as one of various Mellanox adapters) to take advantage of GPUDirect RDMA. There may also be further platform requirements, but this is not my area of expertise and I don’t know details.
RDMA simply means Remote DMA (Direct Memory Access). It’s not an NVIDIA term. With respect to NVIDIA, as you mentioned, it does mean having the ability to write directly to GPU Memory.
Industry has taken advantage of GpuDirect RDMA, as with the link mentioned before and there is also a way to move data to/from GPU memory to an NVME capable SSD. I really like this capability, but realize SSDs have limited read/writes. If your pushing GBs of data every second, that SSD isn’t going to last you all that long. But, depending on the needs, a few day may be good enough; Expensive, but that’s relative as well.
NVLINK is also a great feature! In the GeForce world it’s quite limited, but move over to Quadros and you just doubled your memory (Unified Memory). One GPU for computations and the other GPU for rendering. Both devices have direct access to the shared memory - no PCIe traffic whatsoever as it’s going over dedicated NVLINK.