Questions on 2D Shared memory allocation for Matrix mulitplication

Hi,

After lots of struggle I could install CUDAtoolkit3_2 and SDK on Ubunto10.10
in my Sony VAIO i7 laptop (with GeForce GT 425M).

I’m on the learning phase and running as much as possible example programs.
While trying to compile the matrix multiplication program presented in the document: CUDA_C_Programming_Guide.pdf
(page 39) I got compilation error from the following code:
----------------code snippets from my program m2m.cu ----------------
// Shared memory used to store Asub and Bsub respectivley
shared float As[BLOCK_SIZE][BLOCK_SIZE];
shared float Bs[BLOCK_SIZE][BLOCK_SIZE];

// Load Asub and Bsub from device memory to shared memory
// Each thread loads one element of each sub-matrix
As[row][col] = GetElement[Asub, row, col];    //line 127
Bs[row][col] = GetElement[Bsub, row, col];    //line 128

The compilation error is:

m2m.cu(127): warning: expression has no effect

m2m.cu(127): warning: expression has no effect

m2m.cu(127): error: expression must be a pointer to a complete object type

m2m.cu(128): warning: expression has no effect

m2m.cu(128): warning: expression has no effect

m2m.cu(128): error: expression must be a pointer to a complete object type

It looks that 2D shared memory As and Bs need to be defined some other way.
Can someone help to solve this issue.

Thanks in Advance…

Regards,
Syed

What is [font=“Courier New”]GetElement[/font]? You are using it as if it were a one-dimensional array. Then you use two comma operators inside the square brackets, which throw away their first argument and thus lead to the “expression has no effect” warning. The Third warning probably comes from the fact that [font=“Courier New”]GetElement[/font] probably isn’t a one-dimensional array or a pointer.

CUDA C is no different than C with regard to these things.

GetElement is defined by:

// Matrices will be stored in row major order
// M(row, col) = *(M.elements + row * M.width + col)
typedef struct {
int width;
int height;
int stride;
float *elements;
} Matrix;

// Thread block size
#define BLOCK_SIZE 16

// Get a matrix element
device float GetElement(const Matrix A, int row, int col)
{
return A.elements[row * A.stride + col];
}

The code is used from CUDA C Programming guide from toolkit 3.2 version

Right. So you have a pair of elementary syntax errors of the right hand side of two statements in question.

Are you using Mathematica when you aren’t programming CUDA? C uses round brackets for function calls.

Great! It is working. I use late night time to learn CUDA, just it missed my eye :-)
Thanks for the help…

Regards,
Syed Abid Hussain