Data transfer between multiple GPUs How to do it fast ?

cudacuda321 · January 20, 2010, 10:54pm

My application needs to be split into 2 GPUs (using GTX 295) to double performance.
The specifics of the algorithm are such that the second step of it can be divided into 2 independent parts, but both halfs require same set of intermediate data from the first step.

Therefore, I have 2 possibilities:

Perform redundant computation of intermediate data set on both GPUs. As a result, performance will not scale by a factor of 2 with having 2 GPUs.
Split step 1 between 2 GPUs and let them exchange missing chunks of intermediate data between each other.

I would like to achieve performance scaling, so I need to follow possibility 2).

However, the data exchange is significant so that if I use pageable memory pointer to perform data exchange, I will have big performance penalty (no async data transfer).

I really need to use page-locked memory and async data transfer.
I use runtime API. I use 2 pthreads, each thread using 1 GPU.

If I understand correctly, page-locked pointers assigned within a thread are pinned to a specific GPU. So in order to exchange data between GPU’s I have to run a separate thread that will exchange data between page-locked pointers related to 2 GPUs. This is possible in principle but adds substantial complexity to the implementation.

Am I missing any “right” way to exchange data between GPUs, using async data transfer mode ?
I read in some forum from a year ago that future versions of CUDA can allow for page-locked pointers shared between GPUs. Was it implemented ?
Is there any special possibility for GTX 295 (which has 2 GPUs in the same box!) ?

Any suggestion would be appreciated…

Thanks

tmurray · January 20, 2010, 10:55pm

use portable pinned memory–cudaHostAlloc, I think, is the function you want.

the_ps · January 20, 2010, 11:11pm

See section 3.2.5.1 in the 3.0 Programming Guide:
“A block of page-locked memory can be used by any host threads, but by default, the
benefits of using page-locked memory described above are only available for the
thread that allocates it. To make these advantages available to all threads, it needs to
be allocated by passing flag cudaHostAllocPortable to cudaHostAlloc().”

So you shouldn’t have a problem.

fcs · January 21, 2010, 9:22am

more on pinned memory there : [url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=98502[/url]

cudacuda321 · January 21, 2010, 2:33pm

Thanks, this is a solution.

The problem was that we’re using CUDA 2.0. We need to switch to the latest revision.

Topic		Replies	Views
pinned memory with multiple GPUs CUDA Programming and Performance	4	2594	April 12, 2008
copy in multi GPUs CUDA Programming and Performance	13	4852	February 7, 2009
Data transfer between two GPUs CUDA Programming and Performance	6	2794	September 9, 2009
Multiple GPU's and sharing memory Will a CUDA API eventually be provided for this? CUDA Programming and Performance	4	16528	June 28, 2010
Can we R/W with cuda 2.1 from one GPU RAM to another? CUDA Programming and Performance	3	1986	March 29, 2009
memcpy between GPUs? CUDA Programming and Performance	2	4684	June 26, 2010
GPU to GPU transfers most effective method? CUDA Programming and Performance	27	38247	March 3, 2011
transfer from pageable host memory to page-locked host memory? CUDA Programming and Performance	3	1071	June 1, 2012
how to use portable pinned memory for multiple gpu CUDA Programming and Performance	1	3066	September 7, 2009
Questions for multiple GPUs CUDA Programming and Performance	8	7201	April 20, 2009

Data transfer between multiple GPUs How to do it fast ?

Related topics