Am not a software guy - so I am seeking guidance. I have a server sending a stream of data over 40 GbE. I have a GPU Server that hosts a Tesla GPU that uses CUDA to process data (right now copied from host memory).
The GPU Server also has a 40 GbE NIC that supports GPUDirect. I would like to process data coming from the remote server. Can I accomplish everything I need to from within CUDA, or do I need something that runs separately - to RDMA the data from the NIC to GPU memory. I am rather confused…
You can integrate transfers to your project, here is a nice project to check for transfers https://github.com/NVIDIA/gdrcopy
If you are talking about system design in terms of hardware, you do not need anything else as far as I know.
But then I am not a source to provide an official answer :) Take everything with a grain of salt
I hope it helps