Data transfer speed between CPU and GPU

Hi all,

Say I have five steps in one algorithm. Some of them are implemented by CPU, the others are GPU based.

Step 1(GPU)
Step 2(CPU)
Step 3(GPU)
Step 4(CPU)
Step 5(GPU)

In this way, I need frequently transfer data between CPU and GPU. My data is large(30M) and I need the whole algorithm finish in a very short time(20ms).

Now, data transfer takes most of time. Any way to speed up data transfer or even ask CPU to access GPU data directly?



You can use pinned memory to increase memory copy times. Also you could partition you calculation into smaller pieces (which are still large enough to saturate the GPU) and then process one partition while you are transferring another.

wait for your solution. External Image

Thanks. I believe zero-copy is what I needed.

In chapter 10 of this book ( ) is given an example how to use streams to saturate the gpu with calculations and how to perform asynchronous transfers while the gpu is doing something.

If you need to transfer the full 30MB before and after each of the steps, and cannot subdivide your data so that you copy to and from the device in parallel, you would need a PCI bandwidth in excess of of 6*30MB/0.020s=9GB/s even with no time spent on processing the data. So you have to find ways to minimize the data copied and to (partially) overlap copying to and from the device and processing the data.