I have a typical GPU based processing task that works as follows:
- Copy data from CPU to GPU
- Do some processing on the GPU
- Copy data from GPU to CPU
I have implemented this process using async memcpy (for 1 + 3) as well as creation of memory mapped memory areas. Both realizations work as expected.
Now, here is my question: What are the benefits and drawbacks of the two approaches? Here is my assumption:
-> Approach “Async memcpy”: The explicit async memcpy (1 + 3) will take some time but the processing is rather fast due to the data being local on the GPU (not like with “shared” memory but at least somewhere in GPU RAM).
-> Approach “memory mapped data”: The memory mapping approach will not take any additional transfer time but to access the data from within the GPU code will take longer since the data must be transferred element by element whenever reading the values that lie in the mapped area.
As a conclusion, it can not be predicted which approach is more efficient since it depends
a) on the processing algorithm and the involved number of reads/writes of data in memory mapped areas from GPU side.
b) on the exact realization of the memory mapping: is the memory mapped data part of a memory that can be accessed from GPU with or without running data through the PCI bus.
Is that assumption correct and is there a way to better understand what exactly happens when accessing data located in a memory mapped area?
Thank you and best regards