Async memcpy vs. memory mapping

Hi everyone,

I have a typical GPU based processing task that works as follows:

  1. Copy data from CPU to GPU
  2. Do some processing on the GPU
  3. Copy data from GPU to CPU

I have implemented this process using async memcpy (for 1 + 3) as well as creation of memory mapped memory areas. Both realizations work as expected.

Now, here is my question: What are the benefits and drawbacks of the two approaches? Here is my assumption:

-> Approach “Async memcpy”: The explicit async memcpy (1 + 3) will take some time but the processing is rather fast due to the data being local on the GPU (not like with “shared” memory but at least somewhere in GPU RAM).

-> Approach “memory mapped data”: The memory mapping approach will not take any additional transfer time but to access the data from within the GPU code will take longer since the data must be transferred element by element whenever reading the values that lie in the mapped area.

As a conclusion, it can not be predicted which approach is more efficient since it depends

a) on the processing algorithm and the involved number of reads/writes of data in memory mapped areas from GPU side.
b) on the exact realization of the memory mapping: is the memory mapped data part of a memory that can be accessed from GPU with or without running data through the PCI bus.

Is that assumption correct and is there a way to better understand what exactly happens when accessing data located in a memory mapped area?

Thank you and best regards
Hauke

In general your analysis is correct. However, for unspecified data access patterns, most of the time GPU kernel access to on-board data (method 1 - asysnc memcpy) will be noticeably faster than using “mapped” data.

There are only a few very specific access patterns for which the mapped method would be no slower than the async method for kernel data access, and no situations for which the mapped method would be faster for kernel data access.

The mapped method requires any data accessed by the GPU to be transferred over PCI bus.