2d matrices getting started please help

Hi all

I am a final year student and my thesis is based on an implementation of sound wave propagation using the finite deifference time domain on a GPU. I just started the implementation phase of my thesis and i am having difficulty getting started with the programming. I looked at the examples given in the projects folder and in the matrix multiplication example, they use a 1d array to represent a matrix, then when the shared memory is called it becomes a 2d array and is later stored again as a 1d array. this confused me some what. I need to make use of 2d array but the examples only show how to allocate space for 1d arrays. So what I have done is just created arrays without allocating space - double array[width][height] - and that seemed to work up until when I started calculating values for them:

for(int i=0;i<500;i++){
for(int j=0;j<500;j++) {
c1[i][j] = (dt/(Ro[i][j]ddx));
c2[i][j] = ((dt/ddx)
(lamConst[i][j] + (2.0*muConst[i][j])));
c3[i][j] = ((dt/ddx)*lamConst[i][j]);
c4[i][j] = ((dt/ddx)*muConst[i][j]);
}
}

it compiles but during run time the error i get is - Unhandled exception at 0x00401f97 in fdtd.exe: 0xC00000FD: Stack overflow - what does this mean? will I be able to pass these 2d arrays to the GPU memory and use it for compution? im quite confused and I have done quite a bit of reading on getting started. please can some help me.

Well, any n-dimensional array in memory (device, host, whatever) is inherently a 1D array. Saying an array is 2D, 3D, etc. is just a convenience for the programmer.

For example, these are the same:

  • 1D matrix (a vector) with length 100, and index k
  • 2D matrix (10x10) with indices i, j

They are totally the same to the compiler…it just translates Array2D(3,4) into Array1D(34).

So, the array types in CUDA are also provided for the same convenience…you just need to check your indices when converting between the two. As for your specific error, make sure you’re not using more than 16KiB of memory for your kernel, as it would be running out of shared memory. The program may be trying to write to a memory address that doesn’t exist (higher than the 16KiB mark).

EDIT: Also, note that a 500x500 array of ints takes up about 1MiB…so if you’re trying to load them all into shared memory, that could be your problem.

Also, why not just create your array/matrix, then use CUBLAS for the multiplication step?

wow thanx dude that actually made alot of sense concerning the examples and how they reference the indexing.