RDMA from NIC to GPU? From within CUDA?

Am not a software guy - so I am seeking guidance. I have a server sending a stream of data over 40 GbE. I have a GPU Server that hosts a Tesla GPU that uses CUDA to process data (right now copied from host memory). The GPU Server also has a 40 GbE NIC that supports GPUDirect. I would like to process data coming from the remote server. Can I accomplish everything I need to from within CUDA, or do I need something that runs separately - to RDMA the data from the NIC to GPU memory. I am rather confused…