Simple question regarding mem allocation/copy

Ringworm · December 5, 2009, 4:59am

Hello. I’m working on a program to apply a blur algorithm to a matrix. First I’ve used a 1D array and stored the matrix in row-major order and it worked fine, but now I want to modify it to use 2D arrays.

float **h_iA, **d_iA;

h_iA = (float**)malloc(sizeof(float*)*N);

for (int m = 0; m < N; m++)

	h_iA[m] = (float*)malloc(sizeof(float)*N);

InitMat(h_iA, N);

cudaMalloc((void**) &d_iA, sizeof(float*)*N);

for (int u =0; u < N; u++)

	cudaMalloc((void **) &d_iA[u], sizeof(float)*N);

for (int i=0;i<N;i++)

	cudaMemcpy(d_iA, h_iA, sizeof(float*)*N , cudaMemcpyHostToDevice);

for (int i=0;i<N;i++)

	cudaMemcpy(d_iA[i], h_iA[i], sizeof(float)*N , cudaMemcpyHostToDevice)

Is this the proper way to allocate the device memory for d_iA and copy the data from h_iA to it?

Thanks.

LSChien · December 5, 2009, 5:25am

cudaMemcpy(d_iA[i], h_iA[i], sizeof(float)*N , cudaMemcpyHostToDevice)

is invalid since d_iA[i] is address of device memory, it cannot be accessed in host code.

(d_iA is valid since d_iA is address in host memory, although its content is an address of device memory)

why not use 1-D array with 2-D row-major index?

Ringworm · December 5, 2009, 5:40am

cudaMemcpy(d_iA[i], h_iA[i], sizeof(float)*N , cudaMemcpyHostToDevice)
is invalid since d_iA[i] is address of device memory, it cannot be accessed in host code.

(d_iA is valid since d_iA is address in host memory, although its content is an address of device memory)

why not use 1-D array with 2-D row-major index?

So, everything else is correct and that line isn’t needed?

The offset multiplications tend to be a bit heavy with a large amount of data.

LSChien · December 5, 2009, 9:37am

This is not correct, if you use pointer array to implement 2-D array A, then

if you want to fetch A[i][j], then you must fetch pointer A[i] first,

you must pay 500 cycles, however if you use 2-D index, then you just pay

several MAD operations in index computation, for example

const int inx = threadIdx.x;

	const int iny = threadIdx.y;

	const int ibx = blockIdx.x * BLOCK_SIZE_X;

	const int iby = blockIdx.y * BLOCK_SIZE_Y;

// gid is 1-D index of array under row-major 	

	int  gid = ( ibx * BLOCK_SIZE_X + inx )  +  ( iby * BLOCK_SIZE_Y + iny ) * wA;

if you can invoke more than 192 active threads in one SM, then every MAD only costs 1/8 cycle.

Topic		Replies	Views
cudaMemcpy2D To Host CUDA Programming and Performance	6	3433	June 8, 2012
Help with cuda 2d array CUDA Programming and Performance	6	7452	September 29, 2014
Copying allocated arrays to an array of arrays/pointers CUDA Programming and Performance	2	39	May 12, 2025
Pointers array CUDA Programming and Performance	7	5570	July 28, 2009
2D host memory allocation CUDA Programming and Performance	3	2673	February 25, 2009
2D arrays with cuda confusion CUDA Programming and Performance	2	1098	May 9, 2010
How do I pass a double pointers array to the device? I'm getting cudaErrorIllegalAddress CUDA Programming and Performance	12	3566	January 17, 2024
How can I allocate 2-dimensional array on the device memory? CUDA Programming and Performance	5	15729	August 6, 2009
allocating double pointer memory in GPU CUDA Programming and Performance	3	11784	February 3, 2011
How to use 2D Arrays wrapped in structs in CUDA? CUDA Programming and Performance	4	1477	October 17, 2017

Simple question regarding mem allocation/copy

Related topics