How to use 2D Arrays wrapped in structs in CUDA?

I have a C structure with a) a static 1D array and b)a double pointer which is intended to point to a 2D array allocated dynamically.

The struct:

typedef struct mystruct
{
    int a[ROWS];
    int **data;
}mystruct;

Allocation of memory to 2D array:
var->data = (int**)malloc(ROWSsizeof(int));
for (i = 0; i < ROWS; i++)
var->data[i] = (int*)malloc(COLUMNS*sizeof(int));
//Code to fill this 2D array and the 1D array

Allocation of memory on the device:
cudaMalloc((void**)&d_var->data, ROWSsizeof(int));
for(i = 0; i < ROWS; i++)
cudaMalloc((void**)&d_var->data[i], COLUMNS*sizeof(int));

Copy the data to the device:
cudaMemcpy(d_var, var, (sizeof(mystruct)+ROWSCOLUMNSsizeof(int)), cudaMemcpyHostToDevice);

Question: The code compiles fine but gives a seg fault when I try to run it. Is this the correct way to do this? Am I missing something?

Here is the link to the code: Examples/exp2Darray.cu at master · rohitgavval/Examples · GitHub

Thank you for your help!

Yes. Your matrices consist of ROWS separate allocations. Each allocation is pointed to by one element of an array of pointers. Yet there is only one call to cudaMemcpy(). What you need to do is copy each allocation on the host to the corresponding allocation on the device with a cudaMemcpy() call. This is a standard C/C++ pattern, an example of “deep copy”.

Thank you for looking into this.

Does this look correct?

for(i = 0; i < ROWS; i++)
      cudaMemcpy(d_var->data[i], var->data[i], (sizeof(mystruct)+COLUMNS*sizeof(int)), cudaMemcpyHostToDevice);

I have an additional question. On the host, I have defined an array of structures, each inturn containing 2D arrays. In the above method (if my corrected code works), I will be able to transfer one such structure. How can I do this for an array of structures? Even if I do this by putting the above in a loop(which would probably be very expensive) how can I make the individual threads access different structs?

Why do you have ‘sizeof(mystruct)+COLUMNS*sizeof(int)’? Each row simply consists of COLUMNS integers:

var->data[i] = (int*)malloc(COLUMNS*sizeof(int));
cudaMalloc((void**)&d_var->data[i], COLUMNS*sizeof(int));

You are copying one row at a time, so you need to copy whatever you have allocated per row. My advice: Draw a picture of the data structures on the host and the device, and things should become clear in an instant.

Sorry for my oversight. I missed out on correcting that part.

Could you still help me with the “additional question”?

I have an additional question. On the host, I have defined an array of structures, each inturn containing 2D arrays. In the above method (if my corrected code works), I will be able to transfer one such structure. How can I do this for an array of structures? Even if I do this by putting the above in a loop(which would probably be very expensive) how can I make the individual threads access different structs?