Does cudaIpcOpenMemHandle work when p2p is not available?

According to the documentation of cudaIpcOpenMemHandle:

Maps memory exported from another process with cudaIpcGetMemHandle into the current device address space. For contexts on different devices cudaIpcOpenMemHandle can attempt to enable peer access between the devices as if the user called cudaDeviceEnablePeerAccess. This behavior is controlled by the cudaIpcMemLazyEnablePeerAccess flag. cudaDeviceCanAccessPeer can determine if a mapping is possible.

It seems IPC is only available when p2p access is possible.

And also from the documentation:

IPC functionality is restricted to devices with support for unified addressing on Linux and Windows operating systems. IPC functionality on Windows is supported for compatibility purposes but not recommended as it comes with performance cost. Users can test their device for IPC functionality by calling cudaDeviceGetAttribute with cudaDevAttrIpcEventSupport

It seems IPC is available for devices with support for unified addressing on Linux and Windows operating systems.

One more thing: I find PyTorch directly calls cudaIpcGetMemHandle without checking any flags for p2p capability.

My question is, then will cuda IPC work? Is it a general feature that is available for devices with support for unified addressing on Linux and Windows operating systems, or a limited feature that is only available for GPUs with p2p access? If the latter is the case, what would happen when we call cuda IPC functions in GPUs without p2p access?

And furthermore, what decides the p2p access capability? I have seen GPUs connected via SYS that can use p2p, and also GPUs connected via PHB that cannot use p2p. See the issue for details.

P2P capability is required for the cuda IPC api. The cuda simpleIPC sample code will skip any work if it can’t find P2P capable devices.

I expect you would get a runtime reported error in the usual fashion for the CUDA runtime API.

The general system topology requirement is that the devices be identical to each other and on the same fabric (either on the same PCIE bus, or having a direct NVLink connection between them). However, there are various exceptions; I won’t be able to give you a decoder ring, and the final and only authority on the matter is the report given by cudaDeviceCanAccessPeer(). If the report is false, the devices cannot access each other via P2P. If you were expecting the devices to be P2P capable and they are not, you should take that concern to your system OEM.

Thanks for your support. One thing to notice, is that when we call cudaIpcOpenMemHandle, two implicit devices are involved:

  1. the device for the pointer argument of cudaIpcGetMemHandle
  2. the current device when we call cudaIpcOpenMemHandle

If they happen to be the same device in different processes, then ipc call will succeed, though it is not using p2p.

We need to be very careful about the current device when we use p2p capability.

Here is one report of broken driver:

driver says p2p is enabled, but actually cuda cannot do p2p.

correct. If both processes are “using” the same GPU, then P2P is not an issue

Defects are always possible. This usually indicates a platform issue, in my experience, not a “broken driver”, but I don’t wish to argue the point because defects are always possible.

A common platform issue/defect for use of P2P is if ACS is improperly configured in the PCIE switches. Then the driver may report P2P capable, but P2P may still be broken. The resolution in these cases is to get the platform fixed, not the driver.

FYI: 545 series drivers are indeed broken, they will report p2p support while actually it is not available.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.