To allocate dynamic shared memory – You need to use the extra shmem size kernel configuration parameter when you pass the kernel.
For example: You usually invoke kernel like:
kernel_name <<< grid, block >>> (..)
“grid” and “Block” are standard required kernel configuration parameters.
Apart from this, you can specify dynamic shared memory size as the third argument…
kernel_name <<< grid, block, 100*(sizeof(float) + sizeof(int)) >>> (...)
In the example above 100 integers and 100 floats are created as dynamic memory that corresponds to that particular invocation of that kernel.
Thus from invocation to invocation, this dynamic shard memory size could vary depending on your inputs.
Of course, this dynamic shared memory is PER block.
And, to access this inside your kernel , you have to specify like this:
Â extern __shared__ int dynamicMemory;
Â __shared__ int *intarray;
Â __shared__ int *floatarray;
At the kernel’s run time, “dynamicMemory” will have its address as the “start” address of the dynamic memory in that block. Note that if you declare multiple such external declarations then all such symbols would correspond to the “start” address of the dynamic memory in that block.
So, you should do something like this at the start of your code:
if (threadIdx.x ==0 )
Â Â intarray = (int*)(dynamicMemory);
Â Â floatarray = (float *)(&intarray);
and so on.
The CUDA programming guide has info on this. Just search for “extern” and you can reach that section…
Hope this helps.