hello,
I am in the process of building a gpu clustering server using A100 and A40.
And, I am in the process of building using the nv_peer_mem,ucx,gdr_copy modules.
https://forums.developer.nvidia.com/t/gpudirect-rdma-with-nvidia-a100-for-pcie/215032
Through this link, I also found out that the A100 series also supports GPURDMA.
So, I have the following questions:
- Is it possible to utilize the A40 to support GPUDirect RDMA?
- If possible, will GPUDirect RDMA between A100 and A40 work in Server1(A100), Server2(A40) environment?
- The applicable NVIDIA GPUDirect RDMA whitepaper* confirms that the supported GPUs are NVIDIA Tesla series. Is it true that the white paper hasn’t been updated yet? (*https://docs.nvidia.com/networking/display/GPUDirectRDMAv18/System+Requirements+and+Recommendations)
