According to https://docs.nvidia.com/570TRD1-trusted-computing-solutions-release-notes.pdf, , there are specific limitations in the Hopper PPCIe Mode.
- In the PPCIe mode, when the source or destination operand are imported, GPU memory allocations on a device that is not visible to the process, the host-to-device, or device-to-host copies might fail asynchronously with cudaErrorLaunchFailure.
- In the PPCIe mode, using cooperative_groups::multi_grid_group::sync in kernels launched with cudaLaunchCooperativeKernelMultiDevice results in the kernel failing with cudaErrorIllegalAddress.
- CUDA Interprocess Communication (IPC) is not supported in PPCIe mode.
I would like to seek clarification on the reasoning behind these limitations:
-
Why the first and second points mentioned are forbitten in PPCIe mode. Shouldn’t the GPU and Switch belonging to the same VM trust each other?
-
Why isn’t CUDA IPC supported in PPCIe mode? Even sharing video memory on the same GPU isn’t allowed. And isn’t it supported in SPT CC either?