2D array issues/ performance questions

Hello,

I’m pretty new to cuda and c programming and have run into an interesting/simple problem with 2D arrays.

I’ve tried a number of different things to pass a 2D array of floats to my device/kernel function. From my understanding, there is no way to cudaMalloc a 2D array and that the function definition

global void kernel(float** plateGrid)

wont work because its a pointer to a pointer. Others said to unroll the 2D array into a 1D array.

The program itself is to calculate the temperature of the inner plates of a grid by averaging its four sides, while the outer ring of plates is set to a certain temperature.

so my questions are:

  1. Is possible to pass a 2D array into the device, and if so would it perform as well as operating on a 1D array?
  2. Would it be possible to extend this to a 3D array?
  3. Any other tips for increasing performance?

Included code: https://github.com/cudaBandit/temperature_grid.git

You are best off thinking of the array as 1D and doing the respective indexing (this is how it is laid out in memory anyway).

That kernel has all kinds of problems, such as using variables which are not defined/declared (temp, err), and this:

temp = plateGrid[posX + 1][posY] + plateGrid[posX][posY + 1] + plateGrid[posX - 1][posY] + plateGrid [posX][posY - 1];

, since posX and posY can be zero, so you will be trying to access out of bounds.

At this point you need to look at other CUDA examples. To keep things simple use just the x dimension and determine if it is in-bounds or not, then figure out where that is in memory and perform the operation.

There are other things wrong as well, but you are going to need to start with something simple like a vector add to get the general idea.

Google is your friend, look for stuff like this:

http://developer.download.nvidia.com/books/cuda-by-example/cuda-by-example-sample.pdf