I saw that there are other posts about this topic, but my situation is slightly different…
I am allocating device pointers to layer data, and would like to access that data in a single kernel call. I do not know how many layers I will need until runtime, so I would like to send the array of pointers to the device.
…
float* d_t_Layr[20];
float* h_t_Layr = (float*)malloc(size*sizeof(float));
…
for(int x = 0; x < numLayers; x++)
{
cudaMalloc((void**)&d_t_Layr[x], size*sizeof(float));
cudaMemcpy(d_t_Layr[x], h_t_Layr, size*sizeof(float), cudaMemcpyHostToDevice);
}
…
buildTerrain_k<<<grid, block>>>(d_t_Full,
d_t_Layr,
numLayers,
width);
…
void buildTerrain_k(float6* d_t_Full,
float* d_t_Layr[],
int numLayers,
int width)
{
unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
int i = 0;
//get cell positions
c = y*width+x;
//Start at zero
d_t_Full[c].pos.z = 0.f;
//Build terrain from Layers
while(i < numLayers)
{
d_t_Full[c].pos.z += d_t_Layr[i][c];
i++;
}
}
This line is where the program is failing. I am not able to access the pointers correctly. A buddy suggested just using a 3D array which will work, but I would like to see if there is something simple I am missing here. I am newer to CUDA, so there could definitely be something simple overlooked. Let me know if there is anything I didn’t include. Thanks for the replies.