I am running libfabric 1.12.1, CUDA 11.2, and using the latest nv_peer_mem driver and MOFED 5.2-2.2.0.0. The underlying calls libfabric is using is to libibverbs.
When trying to send using a buffer from GPU memory, I get the following error:
mlx5: host_unknown: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 04005104 0a0002fe 00028ad2
Libfabric reports that it is a protection error, however the memory is being properly registered in nv_peer_mem and I have checked that I am using the correct memory address and local key. Is there any way to interpret this dump and get to the bottom of this error?