memcpy2d question Question on memcpy2d implementation

jrollbaker · October 1, 2008, 7:43pm

Folks,

I have a question on CudaMemcpy2D().

Where is the padding done? On the host or on the device?

The problem is I have an array for which the first three and last three elements of each row are ghost elements and do not need threads to partake in the calculation. I’ve tried several approaches to improve the alignment, including ghost threads.

My question is “Where is the array reformatted and padded in CudaMemcpy2D?” On the host or the device. In particular, will I take a large hit if I reformat and pad the array by hand on the host? If CudaMemcpy2D() does the work on the host, then I should be able to do it on the host w/o a penalty. If the padding is done efficiently in transit, I presumably could not do it as quickly on the host.

Any insight would be appreciated.

thanks
dayton

alex_dubinsky · October 3, 2008, 5:02pm

I think it does it on the Host. It just knows the right padding for the card, which might vary from hardware to hardware. I’m not sure what it does, but I think it just staggers the rows so that column-down accesses don’t all pound the same memory channel. See here: [url=“http://forums.nvidia.com/index.php?showtopic=63919&view=findpost&p=359400”]http://forums.nvidia.com/index.php?showtop...ndpost&p=359400[/url]

If you have intimate knowledge of which elements you need or don’t need, etc, then just use it yourself. Btw, I’m assuming you’re doing all this to get coallescing?

jrollbaker · October 3, 2008, 5:56pm

Thanks Alex.

Yes I am trying to get coallescing.

Your link to Padding and access time
is informative.

dayton

alex_dubinsky · October 3, 2008, 9:58pm

Yeah, the stuff about memory channels/partitions is something very few people know.

Topic		Replies	Views
help with cudaMemcpy2D I can't get a matrix/ array to copy correctly from host to device CUDA Programming and Performance	3	5103	July 14, 2009
cudaMemcpy2D slow CUDA Programming and Performance	4	5857	January 30, 2009
cudaMemcpy2D example? CUDA Programming and Performance	5	19760	February 1, 2012
What are row alignments for 2D arrays used for? CUDA Programming and Performance	1	762	October 11, 2019
Problem with 2D memory copy using pitch CUDA Programming and Performance	6	6589	November 20, 2011
Can't get copyDeviceToHost to work with cudaMemcpy2D CUDA Programming and Performance	0	3652	November 13, 2009
Padding input 2D array too slow need faster method CUDA Programming and Performance	4	4110	September 18, 2008
trouble with cudaMemcpy2D I cant get a matrix to copy into 2D pitched memory CUDA Programming and Performance	1	959	July 13, 2009
Significance of Pitch for Allocation of 2D Arrays CUDA Programming and Performance	3	2074	June 30, 2009
cudaMemcpy2D() and a few gray hairs It's very slow CUDA Programming and Performance	8	4667	February 13, 2009

memcpy2d question Question on memcpy2d implementation

Related topics