In my previous experience of using GPU, I always transfered data between CPU and GPU through global memory. Namely to allocate arrays, array copies, and then pass these arrays as arguments to GPU functions for calculations. However, this method appears not applicable for my current project in which I need to transfer data continuously from CPU and GPU. It’s somehow like this:
-
CPU receives a data tick per nanosecond (or faster).
-
CPU boardcasts the data and updates corresponding buffers (for example global memory). Multiple units in the buffers might be updated simultaneously.
-
GPU units (blocks or threads) read the assigned buffer units continuously for updated data and do the calculation.
-
GPU units save results into some other buffers (again it can be global memory)
-
CPU reads results from the buffer for the results continuously.
I wonder if this solution is durable. I understand writing to global memory is a bottleneck. In my case I only need to update a few units at a time. After the GPU is initiated, the GPU untis are required to read from and send results to global memory unattendedly (not through calls from CPU).
Any advises are appriciated!