# general idea on dealing with 2d(3d) array in cuda can i put 2d(3d) indexes in CUDA kernels?

please give me some idea on how to deal with 2d or 3d array in CUDA.

during last several weeks, i was busy with getting to know C and CUDA and i’d say i tasted a little of it so far.
(still long way to go) i learned a lot reading posts here…

now i am stepping into a little deeper.

i have a code that deals with 2d and 3d matrices in C like a_h [i][j] or b_h [i][j][k] .
i don’t have time to change all 2d or 3d matrices into 1d. i have to deal with those.
now. i researched a little and i see some CUDA functions like cudaMemcpy2D, cudaMemcpy3D, cudaMalloc3D, etc.
well before getting into more detail on those, i have a general question.

can i put 2d or 3d array in the cuda kernel?
or, i still need to play with 1d array in the kernel even if my host is coded with 2d, 3d array.

that is, can i do something like the following?

{
c[i][j] = a[i][j] + b[i][j]
}

or, it must be something like
{
c[i] = a[i] + b[i]
/* a and b are in 2d or 3d in host code */
}

i’d like to set a clear direction first and start from there…

please give me some idea on how to deal with 2d or 3d array in CUDA.

during last several weeks, i was busy with getting to know C and CUDA and i’d say i tasted a little of it so far.
(still long way to go) i learned a lot reading posts here…

now i am stepping into a little deeper.

i have a code that deals with 2d and 3d matrices in C like a_h [i][j] or b_h [i][j][k] .
i don’t have time to change all 2d or 3d matrices into 1d. i have to deal with those.
now. i researched a little and i see some CUDA functions like cudaMemcpy2D, cudaMemcpy3D, cudaMalloc3D, etc.
well before getting into more detail on those, i have a general question.

can i put 2d or 3d array in the cuda kernel?
or, i still need to play with 1d array in the kernel even if my host is coded with 2d, 3d array.

that is, can i do something like the following?

{
c[i][j] = a[i][j] + b[i][j]
}

or, it must be something like
{
c[i] = a[i] + b[i]
/* a and b are in 2d or 3d in host code */
}

i’d like to set a clear direction first and start from there…

nvcc has no information how to map [i][j] to one-dimensional index except you decalre a[N] where N is a constant.

I will suggest that you write index translation yourself.

or you can use C++, just deine a new class matrix and define operator()

such that you can use a(i,j), not a[i][j].

nvcc has no information how to map [i][j] to one-dimensional index except you decalre a[N] where N is a constant.

I will suggest that you write index translation yourself.

or you can use C++, just deine a new class matrix and define operator()

such that you can use a(i,j), not a[i][j].

say i have a C code that does 3d array multiplication : cm[i][j][k] = am[i][j][k] * bm[i][j][k]

what i have to do is to replace this multiplication routine performing on GPU using CUDA.

the am, bm and cm in c part is already prepared in terms of 3d arrays and i am not going to change that.

here is whats going to happen in my mind.

1. prepare d_am, d_bm, d_cm for device memory.

2. copy am ==> d_am and bm ==> d_bm

3. do multiplication using CUDA

4. copy c_dm ==> cm

it seems pretty easy! but i find it not so…

my main question is that what format should i use for d_cm.

is it going to be d_cm[i] 1d format or d_cm 3d format? or do i have a choice?

say i have a C code that does 3d array multiplication : cm[i][j][k] = am[i][j][k] * bm[i][j][k]

what i have to do is to replace this multiplication routine performing on GPU using CUDA.

the am, bm and cm in c part is already prepared in terms of 3d arrays and i am not going to change that.

here is whats going to happen in my mind.

1. prepare d_am, d_bm, d_cm for device memory.

2. copy am ==> d_am and bm ==> d_bm

3. do multiplication using CUDA

4. copy c_dm ==> cm

it seems pretty easy! but i find it not so…

my main question is that what format should i use for d_cm.

is it going to be d_cm[i] 1d format or d_cm 3d format? or do i have a choice?

Unless you’re experienced with cuda, your matrix multiplication implementation would be very (factor of a hundred or so) inefficient. I find checking out matrix multiplication example in nvidia SDK to be way more educational than trying to implement it by oneself.

Unless you’re experienced with cuda, your matrix multiplication implementation would be very (factor of a hundred or so) inefficient. I find checking out matrix multiplication example in nvidia SDK to be way more educational than trying to implement it by oneself.

yea…i know. i am studying “program massively parallel processors” and “CUDA by example:An Intro. to General-Purpose GPU Programming” and i see your point. but, well. i have to do what i have to and thats why i am asking around… ha ha ha.

yea…i know. i am studying “program massively parallel processors” and “CUDA by example:An Intro. to General-Purpose GPU Programming” and i see your point. but, well. i have to do what i have to and thats why i am asking around… ha ha ha.