I know, if you want to use shared memory in device function, you should pass it form kernel function as a argument.
But, i test a cuda program which define shared memory in device function, i find it work correctly, so i want to know What are the consequences of doing this?
In my view, the C++ feature that is most similar to a statically allocated __shared__ declaration is the array declaration. If we remove the __shared__ keyword, the remainder looks syntactically identical to a C++ array declaration. It has some of the same requirements, such as the size must be a compile-time constant.
It seems to be legal to have multiple statically allocated __shared__ array declaration because:
There is nothing in the CUDA documentation that explicitly prohibits it.
Ignoring the __shared__ keyword, multiple declarations would be legal in C++.
It seems to work when you try it.
NVIDIA-provided code samples demonstrate it.
There is also nothing in any of this (C++, CUDA, examples) that suggest that these declarations all need to be at the same scope, or all need to be at the top-level scope in a CUDA kernel definition. The canonical example for shared memory usage in the programming guide demonstrates a shared declaration in-line, i.e. not at top-level scope.
Having multiple statically allocated shared declarations, regardless of scope, that are within the scope of a CUDA kernel definition, will add together in terms of their total shared memory usage. Obviously (C++) declaring a statically allocated shared declaration at some level other than top-level kernel scope means it will only be usable at the scope it was defined.
That is evidently not the only way to use shared memory.
dynamically allocated shared memory (that uses the extern keyword) has quite a few differences (for example, multiple declarations are generally not allowed, or at least will not provide unique spaces/allocations) so its important to understand the differences, and not assume everything I said above applies equally to the dynamically allocated case, but even in the dynamically allocated case, we may find extern __shared__ declarations in places that might be considered “unusual”