Greetings. I was placing two large arrays onto the GPU with multiple data sets contained within them (each starting at fixed intervals). I’m currently trying to separate these large arrays into smaller ones containing similar sized entry lengths (so I can decrease interval between data entries in each array.
To keep things neat I’ve been placing these smaller entries into structures containing dynamic arrays contained within an array with the following structure:
struct referenceTable
{
int maxSize = 0;
int entryCount = 0;
double *xyzValuesSets;
int *namesSets;
double *d_xyzValuesSets;
int *d_namesSets;
}
Declaring them and populating them like so:
proteinRangeReference referenceTable[6];
referenceTable[0].maxSize = 1024;
referenceTable[0].namesSets = (int*)malloc(referenceTable[0].maxSize * 40000 * sizeof(int));
referenceTable[0].xyzValuesSets = (double*)malloc(referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
referenceTable[0].entryCount = 0;
cudaMalloc((void**)&referenceTable[0].d_namesSets, referenceTable[0].maxSize * 40000 * sizeof(double));
cudaMalloc((void**)&referenceTable[0].d_xyzValuesSets, referenceTable[0].maxSize * 40000 * 3 *sizeof(double));
referenceTable[1].maxSize = 2048;
referenceTable[1].namesSets = (int*)malloc(referenceTable[0].maxSize * 40000 * sizeof(int));
referenceTable[1].xyzValuesSets = (double*)malloc(referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
referenceTable[1].entryCount = 0;
cudaMalloc((void**)&referenceTable[1].d_namesSets, referenceTable[0].maxSize * 40000 * sizeof(double));
cudaMalloc((void**)&referenceTable[1].d_xyzValuesSets, referenceTable[0].maxSize * 40000 * 3 * sizeof(double));
for (int i = 0; i < 50; i++)
{
referenceTable[0].xyzValuesSets[i]=i;
referenceTable[1].xyzValuesSets[i]=i*100;
}
cudaMemcpy(referenceTable[0].d_xyzValuesSets, referenceTable[0].xyzValuesSets, 50 * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(referenceTable[1].d_xyzValuesSets, referenceTable[1].xyzValuesSets, 50 * sizeof(int), cudaMemcpyHostToDevice);
addKernel <<< 1, 50 >>> (referenceTable[0].d_xyzValuesSets, referenceTable[1].d_xyzValuesSets);
And then sending them to the following kernel:
global void addKernel(double *a, double *b)
{
int i = threadIdx.x;
a[i] = a[i] + b[i];
}
Except that the program crashes with access violation errors when I try cudamemcyp the data from referenceTable[1].xyzValuesSets to referenceTable[1].d_xyzValuesSets. Is it possible to store cudamalloced dynamic arrays in the way I’ve attempted or do I need to return to simpler declarations?