# general idea on dealing with 2d(3d) array in cuda can i put 2d(3d) indexes in CUDA kernels?

please give me some idea on how to deal with 2d or 3d array in CUDA.

during last several weeks, i was busy with getting to know C and CUDA and i’d say i tasted a little of it so far.
(still long way to go) i learned a lot reading posts here…

now i am stepping into a little deeper.

i have a code that deals with 2d and 3d matrices in C like a_h [i][j] or b_h [i][j][k] .
i don’t have time to change all 2d or 3d matrices into 1d. i have to deal with those.
now. i researched a little and i see some CUDA functions like cudaMemcpy2D, cudaMemcpy3D, cudaMalloc3D, etc.
well before getting into more detail on those, i have a general question.

can i put 2d or 3d array in the cuda kernel?
or, i still need to play with 1d array in the kernel even if my host is coded with 2d, 3d array.

that is, can i do something like the following?

{
c[i][j] = a[i][j] + b[i][j]
}

or, it must be something like
{
c[i] = a[i] + b[i]
/* a and b are in 2d or 3d in host code */
}

i’d like to set a clear direction first and start from there…

nvcc has no information how to map [i][j] to one-dimensional index except you decalre a[N] where N is a constant.

I will suggest that you write index translation yourself.

or you can use C++, just deine a new class matrix and define operator()

such that you can use a(i,j), not a[i][j].

say i have a C code that does 3d array multiplication : cm[i][j][k] = am[i][j][k] * bm[i][j][k]

what i have to do is to replace this multiplication routine performing on GPU using CUDA.

the am, bm and cm in c part is already prepared in terms of 3d arrays and i am not going to change that.

here is whats going to happen in my mind.

1. prepare d_am, d_bm, d_cm for device memory.

2. copy am ==> d_am and bm ==> d_bm

3. do multiplication using CUDA

4. copy c_dm ==> cm

it seems pretty easy! but i find it not so…

my main question is that what format should i use for d_cm.

is it going to be d_cm[i] 1d format or d_cm 3d format? or do i have a choice?

Unless you’re experienced with cuda, your matrix multiplication implementation would be very (factor of a hundred or so) inefficient. I find checking out matrix multiplication example in nvidia SDK to be way more educational than trying to implement it by oneself.

yea…i know. i am studying “program massively parallel processors” and “CUDA by example:An Intro. to General-Purpose GPU Programming” and i see your point. but, well. i have to do what i have to and thats why i am asking around… ha ha ha.

