CUDA - Array of Pointers - Revisited accessing pointers inside kernel

I saw that there are other posts about this topic, but my situation is slightly different…

I am allocating device pointers to layer data, and would like to access that data in a single kernel call. I do not know how many layers I will need until runtime, so I would like to send the array of pointers to the device.

float* d_t_Layr[20];

float* h_t_Layr = (float*)malloc(size*sizeof(float));

for(int x = 0; x < numLayers; x++)


cudaMalloc((void**)&d_t_Layr[x], size*sizeof(float));

cudaMemcpy(d_t_Layr[x], h_t_Layr, size*sizeof(float), cudaMemcpyHostToDevice);


buildTerrain_k<<<grid, block>>>(d_t_Full,




void buildTerrain_k(float6* d_t_Full,

		    float*   d_t_Layr[], 

		    int numLayers, 

		    int width)


unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

int i = 0;

//get cell positions

c = y*width+x;

//Start at zero

d_t_Full[c].pos.z = 0.f;

//Build terrain from Layers

while(i < numLayers)


    d_t_Full[c].pos.z += d_t_Layr[i][c];         




This line is where the program is failing. I am not able to access the pointers correctly. A buddy suggested just using a 3D array which will work, but I would like to see if there is something simple I am missing here. I am newer to CUDA, so there could definitely be something simple overlooked. Let me know if there is anything I didn’t include. Thanks for the replies.

d_t_Layr is an array on the host and cannot be dereferenced on the device. You need a 2nd device array to hold the list of pointers to device arrays. It gets extremely messy and confusing. Not only that, but you will end up with non-coalesed access patterns with threads accessing the array of pointers on the device.

I agree with your buddy and suggest a 3D array. It is much simpler to allocate and access efficiently.