2 dimensional array

hai

i am buj… new to cuda , i want know how to write the program for 2 dimensionla array … i have seen in programming guide 2.0 … but tat is not clear… help me any one how to copy memory host to device and how to declare host and device variable for 2d arry … plse kindly

write entire the program for below mentioned… thing is i i used 2d variable is declared like as *(*a+N)+N)… but getting errors.

     __global__ void matAdd(float A[N][N], float B[N][N], float C[N][N])

{
int i = threadIdx.x;
int j = threadIdx.y;
C[i][j] = A[i][j] + B[i][j];
}
int main()
{
// Kernel invocation
dim3 dimBlock(N, N);
matAdd<<<1, dimBlock>>>(A, B, C);
}

Do you mean you’re using an array of arrays? Its type is (float**)?

This is a little tricky to use in CUDA. You will have to call cudaMalloc() and cudaMemcpy() multiple times, keeping in mind the rules for dealing with device memory and device memory pointers.

It’s usually better to turn a 2D array into a 1D array, and index it like array[y*width + x].

dear sir

i want know how to declare A ,B,C and how to allocate memory for host and device… if you dont mind can you write those functions and declaration for // A[10][10] ,B[10][10] , C[10][10], using thse variables i want add A and B arrays and rusult will be in C …

plese help me… i am new to cuda…

dear sir

i want know how to use cudaMemcpy2D function for ( copy the data from host to devie )?

You can check the programming guide & reference manual. There are also examples in the SDK that use the function (I find that the easiest personally)

ok sir

but i am not getting any where for how to declare variables … if you dnt mind plse write one sample program for using two dimensional arry… and how to allocate memoery for host and device along with memory copy from host to device … plese try to understand … i am new to cuda… i am very much intreasting to learn this language… plse help me sir

You sound like you’re about to wire me $20 million.

I suggest you do it like this:

#define N 16

__global__ void matAdd(float* A, float* B, float* C)

{

int i = threadIdx.x;

int j = threadIdx.y;

C[j*N + i] = A[j*N + i] + B[j*N + i];

}

int main()

{

float *A_host, *B_host, *C_host;

float *A_dev, *B_dev, *C_dev;

cudaMallocHost( (void**)&A_host, N*N*sizeof(float) );

cudaMallocHost( (void**)&B_host, N*N*sizeof(float) );

cudaMallocHost( (void**)&C_host, N*N*sizeof(float) );

cudaMalloc( (void**)&A_dev, N*N*sizeof(float) );

cudaMalloc( (void**)&B_dev, N*N*sizeof(float) );

cudaMalloc( (void**)&C_dev, N*N*sizeof(float) );

cudaMemcpy( A_dev, A_host, N*N*sizeof(float) );

cudaMemcpy( B_dev, B_host, N*N*sizeof(float) );

matAdd<<<1, dim3(N,N)>>>(A_dev, B_dev, C_dev);

cudaMemcpy( C_host, C_dev, N*N*sizeof(float) );

}

Do not use 2D arrays because it’s much more complicated.