Copy 2 dimensional structure to device

Brainera · October 16, 2018, 11:31am

Hi

Im trying to copy a 2 dimensional struct to the device. It compiles without errors but gives me an illegal memory access when I access the struct at the second cudaEventSynchronize below. On the host I use it to read in a file which works perfect. When I use the 1 dimensional version all works well. Any pointers would be greatly appreciated.

typedef struct nConnType
{
  double *nCondu;
  int nCount;
} nConnType;

nConnType *nConns[2];
nConnType *dev_nConns[2];
double *dev_nCo[2];

for(int i=0;i<2;i++)
   nConns[i] = (nConnType *) malloc(*size/2 * sizeof(nConnType));
	
   getConns(nConns);

//copy values to device in array dev_neighCo and link to struct
for(int i = 0; i <2; i++){
  CUDA_CHECK_RETURN(cudaMalloc( (void**) &dev_nConns[i], *size/2 *sizeof(nConnType)));
  for(int j = 0; j < *size/2; j++){
    CUDA_CHECK_RETURN(cudaMalloc( (void **) &dev_nCo[i], nConns[i][j].nCount * sizeof(double) ) );
    CUDA_CHECK_RETURN(cudaMemcpy( dev_nCo[i], nConns[i][j].nCondu, nConns[i][j].nCount * sizeof(double), cudaMemcpyHostToDevice ) );
    CUDA_CHECK_RETURN(cudaMemcpy( &dev_nConns[i][j].dev_nCo, &dev_nCo[i], sizeof(double *), cudaMemcpyHostToDevice));
    CUDA_CHECK_RETURN(cudaMemcpy( &dev_nConns[i][j].nCount, &nConns[i][j].nCount, sizeof(int), cudaMemcpyHostToDevice ) );
    }
}

CUDA_CHECK_RETURN(cudaEventSynchronize(cDone[0]));
compute_kernel <<< gridDim, blockDim, 0, stream[0] >>>(dev_nConns);
CUDA_CHECK_RETURN(cudaEventSynchronize(cDone[0]));

__global__ void compute_kernel(nConnType * __restrict__ dev_nConns[2]) 
{
   int threadNum = (j**IO_NETWORK_DIM1_D+k)/2;

   for (int i=0; i<dev_nConns[0][threadNum].nCount; i++)
      printf("nib%d con%f \n", dev_nConns[0][threadNum].nCondu[i]);
}

...

saulocpp · October 16, 2018, 7:01pm

If I really understood the loop starting at line 17, you are doing the nested memory allocation that, after dealing with dynamic 1D array of dynamic 1D arrays, the simpler solution is to use a flattened 1D array technique.
It is messy in C/C++ and in CUDA too. The other problem I see is that you are doing an element-by-element cudaMemcpy, which gets more inefficient the more iterations it is done.

A few solutions would be:

If you really want to do the host-device-host copy yourself, switch from a 2D array notation to a flattened 1D notation, as you will only need to use a small formula to loop through the rows. And then do a host-device copy with one single cudaMemcpy operation copying the entire dataset, like:

cudaMemcpy(h_object, d_object, N * sizeof(nConnType), cudaMemcpyHostToDevice);

Of course this is just a concept and not actual working code, so you would have to adjust it.

Use cudaMallocManaged memory, where the driver is responsible for moving the dataset to host if you reference to it from the host, and to device if you reference it from the device.
If you are a regular STL user, you may want to check Thrust.
If you really want to stick to 2D notation, there is cudaMemcpy2D(), but I really discourage you to follow a more complicated route. Flattened 1D arrays will always work, specially if you have to pass them to functions.

Brainera · October 21, 2018, 7:10pm

Thanks for the help.

I have to do the host to device element by element copy because the to transfer datasize is different every iterations. Since cudamallocmanaged doesn’t have a realloc functional do this efficiently. Also a priori I don’t know what space to allocate. Retrieving a flattened 1D scenario is not really a ‘small’ formula I guess.

Topic		Replies	Views
Complex structs in CUDA CUDA Programming and Performance	5	889	May 7, 2024
2D host memory allocation CUDA Programming and Performance	3	2676	February 25, 2009
Code running perfectly on host, put in a kernel, fails for mysterious reasons CUDA Programming and Performance	3	999	May 1, 2013
Passing structures into CUDA kernels CUDA Programming and Performance	9	20337	November 19, 2020
cudaMallocPitch() + cudaMemcpy2D() Want to check if the copy of 2D data between host and dev is work CUDA Programming and Performance	2	5606	June 20, 2012
Can't copy device memory to host memory CUDA Programming and Performance	2	3113	June 10, 2009
Help with cuda 2d array CUDA Programming and Performance	6	7460	September 29, 2014
Difficulties Copying Nested Structs to Device Mem lots of cudaMalloc and cudaMemcpy CUDA Programming and Performance	0	7154	October 8, 2011
Copying 2D array from host to device CUDA Programming and Performance	7	7261	July 27, 2010
How do you copy entire struct's with pointers? Problem with copying structures containing pointe CUDA Programming and Performance	6	9809	July 13, 2011

Copy 2 dimensional structure to device

Related topics