Is there some limit to __shared__ variable?

I find it can not over (float)448*4 or the compiler will output an error.


As can be seen in devicequery output and the manual, there is 16k shared variable space (on current hardware). Some of that is already taken by blockDim, blockIdx, gridDim, and the input-variables of your kernel.

Thanks, i reconfirm it in my test

The shared memory is 4000*4 + … it is just 16k that you mentioned aboved.

The thread number is 448.

Does NV have some plan to increase the shared memory size and thread numbers?

These are hardware limitations, so such increases could only happen in future hardware. NVIDIA doesn’t comment on future hardware, so your guess is as good as ours.

The system I am studing can be divided into small fragments.But the each fragments should add some buffer environments which is almost fixed at 4 A. If the GPU limites the max size at 10 A, it means i have to spend 2*4 at two terminal buffer and only 2 A left for ture fragment. then the efficiency is decrease to 20%. If in 3D space it is much more terrible.