[indent]CUDA memory allocation, allocates memory for the number of elements irrespective of the dimension. Consider how multi-dimensional arrays are stored in any memory - they are contiguous. There for simply allocate sizeof(datatype) * arraywidth * array height
Remember, a 2D array is addressed using two pointers so you may have to do some jiggery pokery with the m-alloc.
[/indent]
[*]Step 2:
[indent]Pass a double pointer to your array as a parameter to your function, you will then be able to access your array via double brackets
[/indent]
[*]Easy Alternative:
[indent]Flatten your arrays and calculate the element index using gridDim, blockDim, blockIdx and threadIdx, if you have one thread associated with a single element of the array. Alternatively pass in the dimensions as parameters:
ElementIndex = Array Width * Y co-ordinate + X co-ordinate
Understand that memory at the hardware level, is accessed in a linear fashion - multi-dimensional access is an illusion provided by the compiler
While your learning CUDA, I’d only the fundamentals from the Runtime API as much as possible until your confident with it i.e. don’t bother with textures yet.
Practice makes perfect, get used to addressing a flattened 2D array.
What I suggest you do is make used of shared memory which CAN be multi-dimensional:
[codebox]
global void MyKernel(int* flatArray)
{
extern shared int my2DArray[32][32]; //size need to be coded a development time though
Well, actually you can “convert” a linear to a multi-dimensional array if the dimensions are known at compile-time (actually gcc can probably do it even when they are variable, but that is not valid C in general I think).
int linear[10*32]; // linear array
int (*twodim)[32] = linear; // interpreted as a two-dimensional [10][32] array.
assert(&twodim[2][4] == &linear[2*32 + 4]); // these are the same
Note that nobody uses this syntax (except possibly in function arguments where you can also use the more readable “int twodim[32]”) because it is simply way more confusing than just doing it “the stupid way”.
In step 2, u mention pass a double pointer to your array as a parameter to your function, but in your sample code:
__global__ void MyKernel(int* flatArray)
{
extern __shared__ int my2DArray[32][32]; //size need to be coded a development time though
my2DArray[threadIdx.x][threadIdx.y] = flatArray[blockDim.x * threadIdx.y + threadIdx.x];
}
isnt that single pointer in the parameter field? why?