My goal is to receive an UDP multicast with VMA and send directly the packets to Nvidia card memory for further procressing without making any copy of the packets in the CPU system memory. According to VMA documentation, this library make use of Infiniband Verbs to bypass Linux TCP/IP stack. According to OFED code from Mellanox, you only have to pass the CUDA malloc’ed address to the infiniband memory register call, in order to GPUDirect to work once nvidia_peer_memory driver from Mellanox is loaded.
Is it possible to have VMA working with CUDA malloc’ed addresses so that frames are directly written to GPU memory?.