errors while usin shared memory error: expression must have constant value in line 16n 17

Hi,

I am a little bit new to CUDA.I have written the following program for matrix multiplication using shared memory. The algorithm for this is given here on page number 9 :

David Kirk, Chapter 4

However, I am getting following errors :

1- expression must have constant value, n these lines

i shared float as[tile_width][tile_width];

shared float bs[tile_width][tile_width];

2-expression must have integral or enum type n these lines

(

as[j][i] = a[Rowwidth+(mtile_width+i)];

bs[j][i] = b[(mtile_width+j)+widthCol];

c[Row*width+Col]=sum;

)

The complete program is given below. I would be thankful to you for any useful pointers in this regard.

Thanks,

Ankush

__global__ void matrixmul(int *a,int *b, int *c, int width,int tile_width)

{

__shared__ float as[tile_width][tile_width];

__shared__ float bs[tile_width][tile_width];

float k;

int i=threadIdx.x; nt j=threadIdx.y;

int p=blockIdx.x; int q=blockIdx.y;

float Row= (q * tile_width) +j;

float Col= (p * tile_width) +i;

float sum=0;

for( float m = 0; m<width/tile_width;++m)

{

as[j][i] = a[Row*width+(m*tile_width+i)];

bs[j][i] = b[(m*tile_width+j)+width*Col];

for(float k=0; k< tile_width; ++k)

{

 sum + = as[j*width+k] * bs[k*width+i];

c[Row*width+Col]=sum;

}

}

}

int main()

{

int i,j,m,n,sum;

int Row,k,Col,tile_width,width;

int *a_h,*b_h,*a_d,*b_d,*c_h,*c_d;

const int N=1000;

size_t size= N*sizeof(int);

printf("enter the tile_width");

scanf("%d",&tile_width);

printf("enter the width");

scanf("%d",&width);

//Memory allocation on host and device, a_h, a_d 

a_h=(int*)malloc(size);

cudaMalloc((void**)&a_d,size);

//Memory allocation on host and device, b_h, b_d 

b_h=(int*)malloc(size);

cudaMalloc((void**)&b_d,size);

//Memory allocation on host and device, c_h, c_d 

c_h=(int*)malloc(size);

cudaMalloc((void**)&c_d,size);

//User inputs (row=columns)  

printf("enter the row & coloum of the 1st matrix m ");

scanf("%d%d",&Row,&k); //row = columns

printf("enter the element of 1st matrix m");

for(i=0;i<(Row*k);i++)

{

scanf("%d",&a_h[i]);

}

for(i=0;i<(Row*k);i++)

printf("\t%d",a_h[i]);

{

printf("\n");

}

//copying data (a_h) from Host to Device in a_d 

cudaMemcpy(a_d,a_h,size,cudaMemcpyHostToDevice);

printf("enter the row & coloum of 2nd matrix n");

scanf("%d%d",&k,&Col); //row = columns

printf("enter the element of 2nd matrix n");

for(j=0;j<(k*Col);j++)

{

scanf("%d",&b_h[j]);

}

for(j=0;j<(k*Col);j++)

printf("\t%d",b_h[j]);

{

printf("\n");

}

//copying data (b_h) from Host to Device in b_d 

cudaMemcpy(b_d,b_h,size,cudaMemcpyHostToDevice);

//Kernal call 

dim3 dimBlock(4,4,1);

dim3 dimGrid(1,1,1);

matrixmul<<<dimGrid, dimBlock>>>(a_d,b_d,c_d,width,tile_width);

//copying data (c_d) from Device to Host in c_h 

cudaMemcpy(c_h,c_d,size,cudaMemcpyDeviceToHost);

//printing the results 

for(i=0;i<(Row*Col);i++)

{

printf("%d",c_h[i]);

printf("\n");

}

getch();

free(a_h);

cudaFree(a_d);

free(b_h);

cudaFree(b_d);

free(c_h);

cudaFree(c_d);

}

You can’t dynamically allocated shared memory like this:

__shared__ float as[tile_width][tile_width];

Dynamic allocations happen at kernel invocation, and you’d access the memory like

extern __shared__ float data[];

Look in the programming guide about dynamic shared memory allocation.

You can’t have dynamically sized arrays in C90. You also can’t allocate dynamic memory in CUDA kernels. You can read about how to allocate kernel shared memory at runtime in the “Execution Configuration” section of chapter 4 of the programming guide.

Sorry i have just edited my post.

Ankush

Thanks avidday: But I am running the programs on emulation mode.

Last time I checked, the syntax of the C language were the same in emulation mode as when running on a GPU, so that is sort of irrelevant to your problem. Emulation mode, as the name suggests, emulates running on a GPU. It has the same restrictions on how resources must be specified and the requires the same syntax.

SPWorley And Avidday: Please explain how is this expression dynamically allocating the memory:

[b]shared float as[tile_width][tile_width]

[/b]

I understand that Dynamic memory allocation is possible through calloc and malloc functions in C and cudaMalloc Function in CUDA. In the above expression the value of title_width is passed to the array as[tile_width][tile_width], and thus it becomes static. for example as[16][16].

The value of tile_width is not known to the compiler. In C90, you cannot declare an array whose dimensions are a non-integral value.