Can One pass a structure contained dynamic array to the device?

Greetings. I was placing two large arrays onto the GPU with multiple data sets contained within them (each starting at fixed intervals). I’m currently trying to separate these large arrays into smaller ones containing similar sized entry lengths (so I can decrease interval between data entries in each array.

To keep things neat I’ve been placing these smaller entries into structures containing dynamic arrays contained within an array with the following structure:

struct referenceTable
{
int maxSize = 0;
int entryCount = 0;
double *xyzValuesSets;
int *namesSets;

double *d_xyzValuesSets;
int *d_namesSets;

}

Declaring them and populating them like so:

proteinRangeReference referenceTable[6];

referenceTable[0].maxSize = 1024;
referenceTable[0].namesSets = (int*)malloc(referenceTable[0].maxSize * 40000 * sizeof(int));
referenceTable[0].xyzValuesSets = (double*)malloc(referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
referenceTable[0].entryCount = 0;
cudaMalloc((void**)&referenceTable[0].d_namesSets, referenceTable[0].maxSize * 40000 * sizeof(double));
cudaMalloc((void**)&referenceTable[0].d_xyzValuesSets, referenceTable[0].maxSize * 40000 * 3 *sizeof(double));

referenceTable[1].maxSize = 2048;
referenceTable[1].namesSets = (int*)malloc(referenceTable[0].maxSize * 40000 * sizeof(int));
referenceTable[1].xyzValuesSets = (double*)malloc(referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
referenceTable[1].entryCount = 0;
cudaMalloc((void**)&referenceTable[1].d_namesSets, referenceTable[0].maxSize * 40000 * sizeof(double));
cudaMalloc((void**)&referenceTable[1].d_xyzValuesSets, referenceTable[0].maxSize * 40000 * 3 * sizeof(double));

for (int i = 0; i < 50; i++)
{
referenceTable[0].xyzValuesSets[i]=i;
referenceTable[1].xyzValuesSets[i]=i*100;
}
cudaMemcpy(referenceTable[0].d_xyzValuesSets, referenceTable[0].xyzValuesSets, 50 * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(referenceTable[1].d_xyzValuesSets, referenceTable[1].xyzValuesSets, 50 * sizeof(int), cudaMemcpyHostToDevice);

addKernel <<< 1, 50 >>> (referenceTable[0].d_xyzValuesSets, referenceTable[1].d_xyzValuesSets);

And then sending them to the following kernel:

global void addKernel(double *a, double *b)
{
int i = threadIdx.x;
a[i] = a[i] + b[i];
}

Except that the program crashes with access violation errors when I try cudamemcyp the data from referenceTable[1].xyzValuesSets to referenceTable[1].d_xyzValuesSets. Is it possible to store cudamalloced dynamic arrays in the way I’ve attempted or do I need to return to simpler declarations?

It’s possible. I don’t see anything critically wrong with your code. However a few things to check:

  1. access violation errors are usually something wrong on the host side of things. malloc returns a NULL pointer when it fails. Have you tested your pointers returned by malloc to be sure they are non-NULL?

  2. as you are filling in your referenceTable[1] values, you are using sizes based on referenceTable[0].maxsize. For the code you have shown, it’s not actually a problem, but it might not be your intent.

  3. The items pointed to by d_xyzValuesSets and xyzValuesSets are presumably double quantities. But in your cudaMemcpy operations, you are using sizeof(int). Again, not actually a problem for the code you have shown, but maybe not your intent.

  4. Your data definition doesn’t make sense given the previous struct definition:

proteinRangeReference referenceTable[6];

and for a number of reasons like that, your code is not compilable. If you want to provide a short, compilable code, it’s likely that someone could spot the issue. The following slight adaptation of your code compiles and runs without error for me:

$ cat t988.cu
#include <stdio.h>
#include <assert.h>
struct rT
{
  int maxSize;
  int entryCount;
  double *xyzValuesSets;
  int *namesSets;

  double *d_xyzValuesSets;
  int *d_namesSets;
};

__global__ void addKernel(double *a, double *b)
{
  int i = threadIdx.x;
  a[i] = a[i] + b[i];
}

int main(){

  rT referenceTable[6];

  referenceTable[0].maxSize = 1024;
  referenceTable[0].namesSets = (int*)malloc(referenceTable[0].maxSize * 40000 * sizeof(int));
  assert(referenceTable[0].namesSets != NULL);
  referenceTable[0].xyzValuesSets = (double*)malloc(referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
  assert(referenceTable[0].xyzValuesSets != NULL);
  referenceTable[0].entryCount = 0;
  cudaMalloc((void**)&referenceTable[0].d_namesSets, referenceTable[0].maxSize * 40000 * sizeof(double));
  cudaMalloc((void**)&referenceTable[0].d_xyzValuesSets, referenceTable[0].maxSize * 40000 * 3 *sizeof(double));

  referenceTable[1].maxSize = 2048;
  referenceTable[1].namesSets = (int*)malloc(referenceTable[0].maxSize * 40000 * sizeof(int));
  assert(referenceTable[1].namesSets != NULL);
  referenceTable[1].xyzValuesSets = (double*)malloc(referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
  assert(referenceTable[1].xyzValuesSets != NULL);
  referenceTable[1].entryCount = 0;
  cudaMalloc((void**)&referenceTable[1].d_namesSets, referenceTable[0].maxSize * 40000 * sizeof(double));
  cudaMalloc((void**)&referenceTable[1].d_xyzValuesSets, referenceTable[0].maxSize * 40000 * 3 * sizeof(double));

for (int i = 0; i < 50; i++)
  {
    referenceTable[0].xyzValuesSets[i]=i;
    referenceTable[1].xyzValuesSets[i]=i*100;
  }
  cudaMemcpy(referenceTable[0].d_xyzValuesSets, referenceTable[0].xyzValuesSets, 50 * sizeof(double), cudaMemcpyHostToDevice);
  cudaMemcpy(referenceTable[1].d_xyzValuesSets, referenceTable[1].xyzValuesSets, 50 * sizeof(double), cudaMemcpyHostToDevice);

  addKernel <<< 1, 50 >>> (referenceTable[0].d_xyzValuesSets, referenceTable[1].d_xyzValuesSets);
  cudaDeviceSynchronize();
  double results[50];
  cudaMemcpy(results, referenceTable[0].d_xyzValuesSets, 50 *sizeof(double), cudaMemcpyDeviceToHost);
  for (int i = 0; i < 50; i ++) printf("%f\n", results[i]);
  return 0;
}
$ nvcc -o t988 t988.cu
$ cuda-memcheck ./t988
========= CUDA-MEMCHECK
0.000000
101.000000
202.000000
303.000000
404.000000
505.000000
606.000000
707.000000
808.000000
909.000000
1010.000000
1111.000000
1212.000000
1313.000000
1414.000000
1515.000000
1616.000000
1717.000000
1818.000000
1919.000000
2020.000000
2121.000000
2222.000000
2323.000000
2424.000000
2525.000000
2626.000000
2727.000000
2828.000000
2929.000000
3030.000000
3131.000000
3232.000000
3333.000000
3434.000000
3535.000000
3636.000000
3737.000000
3838.000000
3939.000000
4040.000000
4141.000000
4242.000000
4343.000000
4444.000000
4545.000000
4646.000000
4747.000000
4848.000000
4949.000000
========= ERROR SUMMARY: 0 errors
$

Thanks for responding. I think my data definition looked wrong because I kept adjusting things to try get my code to run.

I got an odd result out of this though. Your code runs perfectly on a remote Linux machine I’m working with. When I try run it on my windows machine through visual studio however, the assert steps fail and (when those are commented out) I get the following memory access violation when the code reaches:

	 for (int i = 0; i < 50; i++)
	 {
->	   referenceTable[0].xyzValuesSets[i] = i;
	   referenceTable[1].xyzValuesSets[i] = i * 100;
	 }

Unhandled exception at 0x00E71685 in dummyTest2.exe: 0xC0000005: Access violation writing location 0x00000000.

I can work with the Linux machine but then I lose the ability to use VS debugging. Any idea what could be causing the error?

Yes, you’re running out of memory on the windows machine. That is the point of malloc returning a NULL pointer. That is how malloc indicates an out-of-memory error. This:

Access violation writing location 0x00000000.
                                  ^^^^^^^^^^

is a NULL pointer (all zeroes).

Those allocations are each requesting ~1GB of memory. Apparently you don’t have enough memory in your windows box.

Correct. Thanks for responding. Silly mistakes in hind sight but I was stuck on those errors for days.

Guess I’ll be testing with smaller data samples locally then I’ll be running elsewhere.

It’s good programming practice to test pointers returned by malloc for NULL before attempting to use them.

That is the purpose of the assert statements I put in there.