2-dimensional array in kernel

Hi.
I have a question.

In the following program,
1-dimensional dA is allocated by cudaMalloc.
I want to use dA as 2-dimensional array in kernel.
Is the definition of (A) and (B) correct ?

#include<stdio.h>
#define NX (3)
#define NY (2)
global void kernel(float dA[NY][NX]){
int ix = blockIdx.xblockDim.x + threadIdx.x;
int iy = blockIdx.y
blockDim.y + threadIdx.y;
dA[iy][ix] = float(ix + iyNX + 1);
}
int main(void){
float A[NX
NY];
float (dA)[NX]; // <=========(A)
size_t size = NX
NYsizeof(float);
cudaMalloc((void**)&dA,size); // <=========(B)
kernel<<<1,dim3(NX,NY)>>>(dA);
cudaMemcpy(A,dA,size,cudaMemcpyDeviceToHost);
for(int i=0;i<NX
NY;i++){
printf("%f\n",A[i]);
}
cudaFree(dA);
}

At (A) you are allocating a HOST 2D array with one of the dimensions unallocated, then at (B) you are allocating DEVICE memory on top of that. I can’t find the original post that explains clearly how to allocate 2D arrays in CUDA, but here is a thread where I explained the 2D memory allocating on device, which I took from the original thread. There are a couple alternatives, though. You can flatten the 2D array to 1 dimension and just make it width * height in size and access it by y*width+x where y and x would be your column and row. I also made a 2D array class that can take care of a lot of it for you. If you want to take a look here is the link https://sourceforge.net/projects/cudaocl/files/.

Thanks for your useful reply.

I want to confirm one thing.
Is my sample program ((A) and (B) part) correct or incorrect ?

(When I execute it with simple data, it seems the result is correct.
But I don’t know if the usage is correct or not.)

It is going to work, but it is not correct. dA is defined as a pointer to the wrong type. However, as the only thing you are doing with the pointer is passing it on to the kernel where it is defined correctly, it will not crash. If the compiler does not warn about it right now, future versions might do.

If you don’t want do define dA correctly, it is better to define it as [font=“Courier New”]void *dA[/font] to indicate that dA may not be dereferenced (on the host).

Furthermore [font=“Courier New”]size[/font] is not calculated correctly as it uses [font=“Courier New”]sizeof(int)[/font] instead of [font=“Courier New”]sizeof(float)[/font]. It just happens that both have the same size.

I would declare dA as [font=“Courier New”]float (dA)[NY][NX];[/font] on both host and device to clearly point out the intentions. This also allows to write [font=“Courier New”]cudaMalloc((void*)&dA, sizeof(*dA));[/font] and not to worry about getting the size right.

Thanks for your advice. I understand.

Furthermore [font=“Courier New”]size[/font] is not calculated correctly as it uses [font=“Courier New”]sizeof(int)[/font] instead of [font=“Courier New”]sizeof(float)[/font].

This is my mistake. I modified the original code. I appreciate your kindness.