cudaMemcpy2D example?

orangenuke · February 1, 2012, 10:52am

Hi,
I was looking through the programming tutorial and best practices guide. There is a very brief mention of cudaMemcpy2D and it is not explained completely. I have searched C/src/ directory for examples, but cannot find any. I also got very few references to it on this forum.
I wanted to know if there is a clear example of this function and if it is necessary to use this function in conjunction with cudaMallocPitch()?
Thanks and Regards.

tera · February 1, 2012, 12:25pm

Check out the Reference Manual or the online documentation.

orangenuke · February 1, 2012, 1:24pm

Yeah, I saw that, however, I am trying to get the following code but I am not able to get it working. What I intended to do was to copy a host array of 760760 which would be inefficient to access to an array of 768768 which would be efficient for my device of compute capability 1.2 (gt 230m with 6 SM, hence the 128*6).

Could you please take a look at it? I would be glad to finally understand this function.

#include<stdio.h>

#include<stdlib.h>

#include<assert.h>

#define N 760 // side of matrix containing data

#define PDIM 768 // padded dimensions

#define TPB 128 //threads per block

#define INDEX 190 //verification index

#define DIV 6

//load element from da to db to verify correct memcopy

__global__ void kernel(float * da, float * db)

{

	int tid = blockDim.x * blockIdx.x + threadIdx.x; 

	if(tid%PDIM < N)

	{

		db[(blockIdx.x/DIV)*N + (blockIdx.x%DIV)*blockDim.x + threadIdx.x] = da[tid];

	}

}

void verify(float * A, float * B, int size);

void init(float * array, int size);

int main(int argc, char * argv[])

{

	float * A, *dA, *B, *dB;

	A = (float *)malloc(sizeof(float)*N*N);

	B = (float *)malloc(sizeof(float)*N*N);

	

	init(A,N*N);

	printf("\n%f ", A[INDEX]);

	cudaMalloc(&dA, sizeof(float)*PDIM*PDIM);

	cudaMalloc(&dB, sizeof(float)*N*N);

	

//copy memory from unpadded array A of 760 by 760 dimensions

//to more efficient dimensions of 768 by 768 on the device

	cudaMemcpy2D(dA,PDIM,A,N,N,N,cudaMemcpyHostToDevice);

	int threadsperblock = TPB;

	int blockspergrid = PDIM*PDIM/threadsperblock;

	kernel<<<blockspergrid,threadsperblock>>>(dA,dB);

	cudaMemcpy(B, dB, sizeof(float)*N*N, cudaMemcpyDeviceToHost);

	//cudaMemcpy2D(B,N,dB,N,N,N,cudaMemcpyDeviceToHost);

	printf("->%f\n", B[INDEX]);

	

	free(A);

	free(B);

	cudaFree(dA);

	cudaFree(dB);

}

void init(float * array, int size)

{

	for (int i = 0; i < size; i++)

	{

		array[i] = i;

	}

}

void verify(float * A, float * B, int size)

{

	for (int i = 0; i < size; i++)

	{

		assert(A[i]==B[i]);

	}

}

tera · February 1, 2012, 4:03pm

Widths and pitches are in bytes, not number of elements (the latter would not work because cudaMemcpy2D() does not know the element size).

orangenuke · February 1, 2012, 6:09pm

Amazing. Thanks a ton. I cannot believe that I was making such a mistake.

I have another question though, if you don’t mind. Is this a legitimate method to avoid possible uncoalesced accesses from a two dimensional matrix? Also, would it make more sense to use this in conjunction with cudaMallocPitch() as opposed to a pseudo two dimensional array?

Thanks and Regards.

tera · February 1, 2012, 6:56pm

Yes, cudaMallocPitch() is exactly meant to easily find the appropriate alignment and pitch for the current device to avoid uncoalesced accesses.

Topic		Replies	Views
Can't get copyDeviceToHost to work with cudaMemcpy2D CUDA Programming and Performance	0	3658	November 13, 2009
question on copy a matrix, which copy function to use CUDA Programming and Performance	1	674	April 18, 2016
Question about cudaMemcpy2D CUDA Programming and Performance	0	2805	April 22, 2008
trouble with cudaMemcpy2D I cant get a matrix to copy into 2D pitched memory CUDA Programming and Performance	1	965	July 13, 2009
quick help with cudaMemcpy2D CUDA Programming and Performance	3	5726	March 9, 2010
cudaMemcpy2D help CUDA Programming and Performance	4	10673	July 28, 2009
help with cudaMemcpy2D I can't get a matrix/ array to copy correctly from host to device CUDA Programming and Performance	3	5112	July 14, 2009
test on 'cudaMallocPitch' and 'cudaMemcpy2D' CUDA Programming and Performance	1	618	November 16, 2010
problem with cudaMallocPitch and cudaMemcpy2D CUDA Programming and Performance	5	6424	April 22, 2009
need help for cudaMemcpy2D() CUDA Programming and Performance	5	4653	December 8, 2009

cudaMemcpy2D example?

Related topics