Hello, I am trying to get GPUDirect RDMA working with libVMA (distributed as part of MLNX_OFED) and am running into issues due to a lack of code examples.
I’ve created a CUDA context and allocated device memory using cuMemAlloc() and want to put UDP packets directly into that device memory. However when I use libVMA’s extra APIs to register that memory with the socket’s rings (using ‘register_memory_on_ring’, and ‘vma_add_ring_profile’), ‘recv’ function calls stop working and I can see via ‘vma_stats’ that it’s timing out on receiving a packet.
Am I missing a step here or fundamentally misunderstanding how to integrate CUDA memory into libVMA? If anyone has example code for specifically using libVMA with CUDA offload I’d be interested.
According to this post (Mellanox Interconnect Community), what I’m trying to do should be possible, but I’m not sure where to go from here.
Thank you for your time!