I’m encountering a problem where I want to use the P2P communication between L40s and A100 (which are both capable of P2P). However, when I put them on the same server, the driver tells me these two GPUs can not communicate with each other via P2P.
Hi @tiancheng21.hu, welcome to the NVIDIA developer forums.
I am not sure, but there might be limitations regarding mixing of GPU architectures, especially older ones like L40 with rather new ones like Ada generation. But I am not certain, so I moved your post to the CUDA programming section where people might know for sure.
Thanks!
It’s a general principle in CUDA GPUs that GPUDirect P2P is only expected to work between devices of the same architecture, and same device type.
Furthermore, another issue you could be running into is server topology. P2P may have requirements on the link that exists between the two CPUs in a dual-socket server, for proper use when one GPU is connected to one socket, and the other GPU is connected to the other socket.
Thanks for your replies. I have verified the server topology issue before; I found that with the same total topology, two A100s can perform P2P, but A100 and L40s fail. I also tried that two Ada architecture GPUs (which are different) can perform p2p correctly, so I am wondering whether two different architectures can perform P2P.
Besides, GPU Direct RDMA allows GPU and NIC to communicate bypass CPU, so I feel confused why two different architecture GPU can not P2P, it seems to be more simple. Is there any solutions can I solve this problem?
Thanks
It’s generally expected that two different architectures will not support P2P. Your results seem to confirm that.
