I’m interested in GPU computing for the signal processing in software-defined radios, which has traditionally relied on FPGAs. The idea would be to have the RF hardware and analog/digital converters connected to the compute chassis housing the CPU and GPU via 10 GigE. Each received burst of data might have a few MB of samples that need to be demodulated, deinterleaved, error corrected, and decrypted. Then reverse that process for transmission, which is much less computationally intensive.
Copying data between CPU and GPU memory is a performance killer that I’m looking to minimize.
If I’m following Nvidia’s info correctly, GPUDirect RDMA would let me move data directly between the Ethernet card and GPU memory without CPU intervention. Defintely a good thing. Is this a doable thing for RF samples?
I’m less clear on NVLink, Pascal, and unified memory. NVLink looks like a high-speed replacement for PCIe when connecting Pascal GPUs and compatible CPUs. What are the compatible CPUs, and when will this hardware be available? Is the transfer DMA, with an interrupt to signal completion?
Unified memory looks like a programming model to treat CPU and GPU memory as a contiguous address space. Best I can tell, in coding C++, one could allocate a unique_ptr to a buffer, fill it with CPU data (e.g., data to transmit), and then std::move it to GPU processing. The compiler would automagically handle the copying. Is this correct? Would this code take advantage of NVLink when that ships?
Lastly, is there a roadmap to physically unifying memory, such as connecting the CPU and GPU to the same memory via different controllers?