cudaMemcpy2D To Host

iassael · March 20, 2011, 2:59am

Hello!

I am trying to copy a 2d array of doubles to the device and it seems that there is a problem when i get the data back…

Inverse(double **a1, double *h_h, int pos, int N) {

	[...]

	Error = cudaMallocPitch((void**)&d_binv, &pd_binv, N * sizeof(double), N);

	Error = cudaMallocPitch((void**)&d_eta, &pd_eta, N * sizeof(double), N);

	Error = cudaMalloc((void**)&d_h, size);

	Error = cudaMallocPitch((void**)&d_y, &pd_y, N * sizeof(double), N);

	// Copy vectors from host memory to device memory

	Error = cudaMemcpy(d_h, h_h, size, cudaMemcpyHostToDevice);

	Error = cudaMemcpy2D(d_binv, pd_binv, a1, N * sizeof(double), N * sizeof(double), N, cudaMemcpyHostToDevice);

	[...]

	Error = cudaMemcpy2D(a1, N * sizeof(double), d_y, pd_y, N * sizeof(double), N, cudaMemcpyDeviceToHost); //Here is where i get the problem

}

Do you have any suggestions?

Thanks!!!

Glupol · March 20, 2011, 4:03pm

Pretty enigmatic…
Did you remember to allocate a memory for a1 ??

Greg

iassael · March 20, 2011, 5:23pm

thanks greg for the reply.

All the variables are allocated the problem is with the double pointer is it the correct way to pass it to the GPU?

avidday · March 20, 2011, 5:36pm

No it isn’t. cudaMemcpy2D is designed for copying from pitched, linear memory sources. There is no “deep” copy function for copying arrays of pointers and what they point to in the API. You will need a separate memcpy operation for each pointer held in a1. Generally speaking, it is preferable to use linear memory with indexing when working with memory which needs to be portable between the host and device. It reduces the operation overhead considerably, and on the GPU, an integer mutliply-add per read is cheaper than the alternatives (like dereferencing several levels of pointer indirection).

iassael · March 20, 2011, 5:45pm

so the best thing that i should do is make it a vector, to gain performance etc…

right?

Thanks again!

avidday · March 20, 2011, 5:59pm

Yes, “flatten” it into a piece of linear memory and just copy the whole thing to the GPU. Use the same column or row major indexing scheme to access the memory on both host and device and it should “just work”.

manum · June 8, 2012, 3:50pm

sorry, but I knew that deep copying is usually intended for struct, isn’t matrix always seen as array of pointers?

and, if it is, when I may use cudaMemcpy2D?

tx

Topic		Replies	Views
2D host memory allocation CUDA Programming and Performance	3	2694	February 25, 2009
CudaMallocPitch and CudaMemcpy2D CUDA Programming and Performance	7	5643	August 3, 2015
Can't get copyDeviceToHost to work with cudaMemcpy2D CUDA Programming and Performance	0	3629	November 13, 2009
need some help with cudaMemcpy/cudamemcpy2D CUDA Programming and Performance	2	2008	June 9, 2010
help cudaMemcpy2d Trying to modify a 2d array on cuda device CUDA Programming and Performance	8	5012	September 11, 2010
How to copy variable with two pointers ** from GPU to CPU CUDA Programming and Performance	1	680	February 24, 2011
help with cudaMemcpy2D I can't get a matrix/ array to copy correctly from host to device CUDA Programming and Performance	3	5056	July 14, 2009
Question about cudaMemcpy2D CUDA Programming and Performance	0	2777	April 22, 2008
2D array of int copy from host to device and vice versa How to do it? CUDA Programming and Performance	1	891	July 9, 2010
2D array & Memory space Mostly about cudaMallocPitch & cudaMemcpy2D CUDA Programming and Performance	1	1485	October 15, 2009

cudaMemcpy2D To Host

Related topics