Hello folks,
I read through the posts here, and the info here: GPUDirect RDMA :: CUDA Toolkit Documentation
It is still difficult to figure out which GPUs enable GPUDirect RDMA, and which don’t.
For example, in GPUDirect RDMA :: CUDA Toolkit Documentation it says:
“GPUDirect RDMA is available on both Tesla and Quadro GPUs.”
Here, is Tesla referring to microarchitecture or the brand of GPUs like Tesla P100/V100?
In nv-p2p.h file:
enum {
NVIDIA_P2P_ARCHITECTURE_TESLA = 0,
NVIDIA_P2P_ARCHITECTURE_FERMI,
NVIDIA_P2P_ARCHITECTURE_CURRENT = NVIDIA_P2P_ARCHITECTURE_FERMI
};
This seems to indicate Tesla as the microarchitecture. In that case, what Tesla GPUs are supported?
Please help!
Robert Crovella may be able to point to better, more meaningful, language.
Note that the designated sales channel for Tesla-brand products is system integrators, not sales directly to end users. These GPUs are intended to be sold and supported as part of a system. System vendors on NVIDIA’s partner list that sell system with integrated Tesla GPUs should be able to tell you what features are supported by their systems. You are probably aware that you need an RDMA-capable counter part to the GPU (such as one of various Mellanox adapters) to take advantage of GPUDirect RDMA. There may also be further platform requirements, but this is not my area of expertise and I don’t know details.
I don’t find a mention of RDMA on the linked page, only copying via pinned host memory. It appears confusion than Nvidia slaps the “GPUDirect” moniker onto less and less direct pathways.
RDMA used to indicate DMA directly from a PCIe device into GPU memory, without host memory involvement.
Now staging via (pinned) host memory in small chunks isn’t necessarily a bad thing (as the buffering decouples timing of the two PCIe devices and may actually improve throughout).
But this seems to me like it was already possible with just a bit of CUDA programming, without having to wait for any driver improvements.
RDMA simply means Remote DMA (Direct Memory Access). It’s not an NVIDIA term. With respect to NVIDIA, as you mentioned, it does mean having the ability to write directly to GPU Memory.
An Aside:
Industry has taken advantage of GpuDirect RDMA, as with the link mentioned before and there is also a way to move data to/from GPU memory to an NVME capable SSD. I really like this capability, but realize SSDs have limited read/writes. If your pushing GBs of data every second, that SSD isn’t going to last you all that long. But, depending on the needs, a few day may be good enough; Expensive, but that’s relative as well.
NVLINK is also a great feature! In the GeForce world it’s quite limited, but move over to Quadros and you just doubled your memory (Unified Memory). One GPU for computations and the other GPU for rendering. Both devices have direct access to the shared memory - no PCIe traffic whatsoever as it’s going over dedicated NVLINK.