copy 3D data from host to device

merlin74 · December 1, 2010, 2:52am

Hi Guys,

I am trying to code a cuda program to do a PDE solving. I need to first copy a 3D array from host memory to device memory.

I have the following code:

float *phi = new float[DimX*DimY*DimZ];

// some computations on phi ;

extern "C"

void ReinitializeCUDA(float *phi, const char *obj_cpu, const char *movobj_cpu, const char *source,

 bool init, int iterations, int I, int J, int K,

 	float delta, float dtau, float eps, float limit,

 int dim[]){

	// dim[0] = DimX, dim[1] = DimY, dim[2] = DimZ

cudaPitchedPtr phi_gpu1;

	cudaExtent ca_extent = make_cudaExtent(dim[0]*sizeof(float), dim[1], dim[2]);	

	cudaMalloc3D( &phi_gpu1, ca_extent); 

	cudaMemset3D( phi_gpu1, 0, ca_extent);

	cudaMemcpy3DParms cpy_params = {0};

 *****cpy_params.srcPtr = make_cudaPitchedPtr( (void*)phi, dim[0] * sizeof(float), dim[1], dim[2] );

	cpy_params.dstPtr = phi_gpu1;

	cpy_params.extent = ca_extent;

	cpy_params.kind 	= cudaMemcpyHostToDevice;

	cudaMemcpy3D( &cpy_params );	

}

Later on, I will copy back the results obtained from a kernel to phi as below:

cudaMemcpy3DParms dhcpy_params = {0};

	dhcpy_params.srcPtr = phi_gpu1;

 *****dhcpy_params.dstPtr = make_cudaPitchedPtr( (void*)phi, dim[0] * sizeof(float), dim[1], dim[2] );

	dhcpy_params.extent = ca_extent;

	dhcpy_params.kind 	= cudaMemcpyDeviceToHost;

	cudaMemcpy3D( &dhcpy_params );

	printf("cudaMemcpy3D: %s\n", cudaGetErrorString(cudaGetLastError()));

My question is about the two lines of code beginning with ****. Shall I use

cpy_params.srcPtr = make_cudaPitchedPtr( (void*)phi, dim[0] * sizeof(float), dim[1], dim[2] );

// ....

dhcpy_params.dstPtr = make_cudaPitchedPtr( (void*)phi, dim[0] * sizeof(float), dim[1], dim[2] );

or

cpy_params.srcPtr = make_cudaPitchedPtr( (void*)phi, dim[0] * sizeof(float), dim[0], dim[1] );

// ....

dhcpy_params.dstPtr = make_cudaPitchedPtr( (void*)phi, dim[0] * sizeof(float), dim[0], dim[1] );

Or to put it in another way, how to properly copy a 3D array from host to device (pointed to by a cudaPitchedPtr).

Thanks.

merlin

DA_enan · December 1, 2010, 1:43pm

You must use second suggestion

but also, this is wrong: cudaExtent ca_extent = make_cudaExtent(dim[0]*sizeof(float), dim[1], dim[2]);
it should be: cudaExtent ca_extent = make_cudaExtent(dim[0], dim[1], dim[2]);

merlin74 · December 1, 2010, 3:39pm

Thanks for your reply. I also think the second approach is correct from the GPU SDK examples. Just curious about how CUDA runtime knows the 3rd dimension and copies enough data.

Are you sure about this? From the Ref Manual,

<a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUDART__TYPES_gf599e5b8b829ce7db0f5216928f6ecb6.html#gf599e5b8b829ce7db0f5216928f6ecb6"'>cudaError_t</a> cudaMalloc3D  ( struct <a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/structcudaPitchedPtr.html"'>cudaPitchedPtr</a> *   <i>pitchedDevPtr</i>,    

struct <a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/structcudaExtent.html"'>cudaExtent</a>   <i>extent</i>    

 ) 

Allocates at least width * height * depth bytes of linear memory on the device and returns a <a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/structcudaPitchedPtr.html"'>cudaPitchedPtr</a> in which ptr is a pointer to the allocated memory. The function may pad the allocation to ensure hardware alignment requirements are met. The pitch returned in the pitch field of pitchedDevPtr is the width in bytes of the allocation.

The returned <a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/structcudaPitchedPtr.html"'>cudaPitchedPtr</a> contains additional fields xsize and ysize, the logical width and height of the allocation, which are equivalent to the width and height extent parameters provided by the programmer during allocation.

For allocations of 2D and 3D objects, it is highly recommended that programmers perform allocations using <a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUDART__MEMORY_g04a7553c90322aef32f8544d5c356a10.html#g04a7553c90322aef32f8544d5c356a10"'>cudaMalloc3D()</a> or <a target='_blank' rel='noopener noreferrer' href='"http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/online/group__CUDART__MEMORY_g80d689bc903792f906e49be4a0b6d8db.html#g80d689bc903792f906e49be4a0b6d8db"'>cudaMallocPitch()</a>. Due to alignment restrictions in the hardware, this is especially true if the application will be performing memory copies involving 2D or 3D objects (whether linear memory or CUDA arrays).

<b>Parameters:</b>  

<i>pitchedDevPtr</i> - Pointer to allocated pitched device memory  

<i>extent</i> - Requested allocation size

It says “allocates at least width * height * depth bytes of linear memory on the device and returns a cudaPitchedPtr in which ptr is a pointer to the allocated memory”.

If you use

cudaExtent ca_extent = make_cudaExtent(dim[0], dim[1], dim[2]);

your memory can only hold (dim[0]*dim[1]*dim[2]) bytes of data, while I need (dim[0]*dim[1]*dim[2])sizeof(float) which is 4(dim[0]*dim[1]*dim[2]) bytes. Or am I missing something here?

DA_enan · December 1, 2010, 3:48pm

Ahh, you might be right. I use cudaMalloc3DArray, to which you also supply channel descriptor (so it knows element’s size).

Topic		Replies	Views
cudaMemcpy3D and cudaMalloc cudaDeviceToDevice copies CUDA Programming and Performance	4	10638	June 29, 2010
Issues with cuda Memcpy3D and pitched memory CUDA Programming and Performance	0	993	August 1, 2013
Setting up 3d arryas I have some questions about how to use 3d arrays and cudaArrays CUDA Programming and Performance	10	27994	April 5, 2010
Need Help with cudaMemcpy3D need device to cudaArray copy CUDA Programming and Performance	0	2771	October 20, 2009
copy cudaMemcpyDeviceToDevice Problem CUDA Programming and Performance	4	4798	October 3, 2008
error on devie2device copy CUDA Programming and Performance	1	1069	May 20, 2010
Please Explain this code! CUDA Programming and Performance	1	3839	July 14, 2009
How to use cudaMalloc3DArray to copy dynamic 3d array from host to device? CUDA Programming and Performance	2	3119	September 2, 2011
cudaMemcpy3D problem CUDA Programming and Performance	4	2975	March 22, 2009
cudamemcpy3d error Legacy PGI Compilers	3	886	November 6, 2019

copy 3D data from host to device

Related topics