I read two samples, in which they use shared memory in different ways.
One is the matrix multiplication at the end of the programming guide. It declares the shared memory array in the device function as:
__global__ void Muld(...)
{
...
__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];
__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];
...
}
The other is the SolbelFilter in the sample projects of SDK. It declares the shared memory array at the start of the file, outside any functions:
extern __shared__ unsigned char LocalBlock[];
also when calling the device function from host, it allocates explicitly the dynamic shared memory space for each block:
SobelShared<<<blocks, threads, sharedMem>>>...
I want to know what are the differences between these two kinds of methods. Why must the shared memory array be “external” in the second case?
Any relevant information is appreciated.