Shared Memory - Dynamic Allocation

Hi there,

I have a question regarding dynamically allocated shared memory.

First of all, parts of my current code looks like this:

[codebox]

void invokeMultMatrixDevice()

{

 dim3 blocksInGrid(4, 4, 1);

 dim3 threadsInBlock(16, 16, 1);

 //All mallocs and memcpys are here

 multMatrixDevice<<<blocksInGrid, threadsInBlock>>>(matrix_result, matrix_a, matrix_b);

 //Copy result back to host

}

global void multMatrixDevice(float *matrix_result, float *matrix_a, float *matrix_b)

{

 __shared__ float As[16 * 16];

 __shared__ float Bs[16 * 16];

//Copy data from global into shared memory goes here

 //Do the work 

}

[/codebox]

This code works well, the missing code fragments shouldnt be interesting.

As you can see, the shared Memory on the device ist static (As and Bs each with the dimension 16x16)

I am trying to rewrite the code that the two shared Memory variables arent statically allocated. I want them to be allocated dynamically. But I dont know how I have to allocate these two variables from the host code (function invokeMultMatrixDevice).

I tried it this way:

[codebox]

void invokeMultMatrixDevice()

{

 dim3 blocksInGrid(4, 4, 1);

 dim3 threadsInBlock(16, 16, 1);

 size_t sharedMem = threadsInBlock.x * threadsInBlock.y * sizeof(float);

 //All mallocs and memcpys are here

 multMatrixDevice<<<blocksInGrid, threadsInBlock, sharedMem>>>(matrix_result, matrix_a, matrix_b);

 //Copy result back to host

}

global void multMatrixDevice(float *matrix_result, float *matrix_a, float *matrix_b)

{

 extern __shared__ float As[];

 extern __shared__ float Bs[];

//Copy data from global into shared memory goes here

 //Do the work 

}

[/codebox]

So, I dont know how to assign the “extern” declarated variables As and Bs to the sharedMem I provided at the kernel call.

Any help would be grateful!

Thanks in advance

Matze

This is covered in the programming guide, p21:

When declaring a variable in shared memory as an external array such as

extern __shared__ float shared[];

the size of the array is determined at launch time (see Section 4.2.3). All variables

declared in this fashion, start at the same address in memory, so that the layout of

the variables in the array must be explicitly managed through offsets. For example, if

one wants the equivalent of

short array0[128];

float array1[64];

int array2[256];

in dynamically allocated shared memory, one could declare and initialize the arrays

the following way:

extern __shared__ char array[];

__device__ void func() // __device__ or __global__ function

{

  short* array0 = (short*)array;

  float* array1 = (float*)&array0[128];

  int* array2 = (int*)&array1[64];

}

thank you!

(oops, i posted this question in the wrong board, sorry for that)

Greets
Matze