I’m having an issue getting a host to host GPU to GPU RDMA working correctly.
I’m setup at the ibverbs/cmverbs level.
My RDMA transaction starts by the client sending an IBV_WR_SEND request with the buffer information for the server to do a IBV_WR_RDMA_WRITE_WITH_IMM back of a much larger buffer (4kx4kx2).
The GPUs are just Quadros (K600) and the HCAs are VPI ConnectX-5s running Eth. As nearly as I can tell, the hardware should support this idea.
I can get a transfer to occur from the server host memory to the client gpu memory, but I cannot get either a gpu-gpu transfer to occur, or a transfer from server gpu memory to a client’s host memory to occur.
When transfering from the server gpu memory, what I see as a response from the IBV_WR_RDMA_WRITE_WITH_IMM is a wc failure on the server side of a local protection fault.
So basically I get that error anytime I try to do a transfer from the server gpu to a client, but not for server host memory to a client’s gpu memory. The layout is the same for the server host memory as it is for the server gpu memory (one just uses the cudaMalloc).
Software versions should all be up to date: RHEL 7.6, nv_peer_memory_1.0-8, cuda 10.1, OFED 4.6-1.09.1.1
Is there some configuration/setup item I’m missing when sourcing from a GPU memory vs the host memory? I’m just using cudaMalloc and an ibv_reg_mr call for the GPU version and posix_memalign and ibv_reg_mr for the host memory version.
Will this configuration work GPU-GPU? And if not, why would host-GPU work?