It seems IPC is only available when p2p access is possible.
And also from the documentation:
IPC functionality is restricted to devices with support for unified addressing on Linux and Windows operating systems. IPC functionality on Windows is supported for compatibility purposes but not recommended as it comes with performance cost. Users can test their device for IPC functionality by calling cudaDeviceGetAttribute with cudaDevAttrIpcEventSupport
It seems IPC is available for devices with support for unified addressing on Linux and Windows operating systems.
One more thing: I find PyTorch directly calls cudaIpcGetMemHandle without checking any flags for p2p capability.
My question is, then will cuda IPC work? Is it a general feature that is available for devices with support for unified addressing on Linux and Windows operating systems, or a limited feature that is only available for GPUs with p2p access? If the latter is the case, what would happen when we call cuda IPC functions in GPUs without p2p access?
And furthermore, what decides the p2p access capability? I have seen GPUs connected via SYS that can use p2p, and also GPUs connected via PHB that cannot use p2p. See the issue for details.
P2P capability is required for the cuda IPC api. The cuda simpleIPC sample code will skip any work if it can’t find P2P capable devices.
I expect you would get a runtime reported error in the usual fashion for the CUDA runtime API.
The general system topology requirement is that the devices be identical to each other and on the same fabric (either on the same PCIE bus, or having a direct NVLink connection between them). However, there are various exceptions; I won’t be able to give you a decoder ring, and the final and only authority on the matter is the report given by cudaDeviceCanAccessPeer(). If the report is false, the devices cannot access each other via P2P. If you were expecting the devices to be P2P capable and they are not, you should take that concern to your system OEM.
correct. If both processes are “using” the same GPU, then P2P is not an issue
Defects are always possible. This usually indicates a platform issue, in my experience, not a “broken driver”, but I don’t wish to argue the point because defects are always possible.