Hi there,
I have a question regarding dynamically allocated shared memory.
First of all, parts of my current code looks like this:
[codebox]
void invokeMultMatrixDevice()
{
dim3 blocksInGrid(4, 4, 1);
dim3 threadsInBlock(16, 16, 1);
//All mallocs and memcpys are here
multMatrixDevice<<<blocksInGrid, threadsInBlock>>>(matrix_result, matrix_a, matrix_b);
//Copy result back to host
}
global void multMatrixDevice(float *matrix_result, float *matrix_a, float *matrix_b)
{
__shared__ float As[16 * 16];
__shared__ float Bs[16 * 16];
//Copy data from global into shared memory goes here
//Do the work
}
[/codebox]
This code works well, the missing code fragments shouldnt be interesting.
As you can see, the shared Memory on the device ist static (As and Bs each with the dimension 16x16)
I am trying to rewrite the code that the two shared Memory variables arent statically allocated. I want them to be allocated dynamically. But I dont know how I have to allocate these two variables from the host code (function invokeMultMatrixDevice).
I tried it this way:
[codebox]
void invokeMultMatrixDevice()
{
dim3 blocksInGrid(4, 4, 1);
dim3 threadsInBlock(16, 16, 1);
size_t sharedMem = threadsInBlock.x * threadsInBlock.y * sizeof(float);
//All mallocs and memcpys are here
multMatrixDevice<<<blocksInGrid, threadsInBlock, sharedMem>>>(matrix_result, matrix_a, matrix_b);
//Copy result back to host
}
global void multMatrixDevice(float *matrix_result, float *matrix_a, float *matrix_b)
{
extern __shared__ float As[];
extern __shared__ float Bs[];
//Copy data from global into shared memory goes here
//Do the work
}
[/codebox]
So, I dont know how to assign the “extern” declarated variables As and Bs to the sharedMem I provided at the kernel call.
Any help would be grateful!
Thanks in advance
Matze