question about shared memory why 16K does not work?

I am developing using S870, which, according to programming guide, should own 16K shared memory per multiprocessor. So I declare “float share[4 * 1024]”, and just use one block so that this block could occupy all the shared memory. However, while compling, nvcc said that I have allocated too much shared. So, who can help me why this happens?

U have taken an array of size 4k and being of float type…4*4k makes it 16k. So u are not left with any memory for other resources.

The parameters you pass to the kernel are also placed in shared memory (as are blockIdx, blockDim gridDim, etc), so you have less than 16k available to play with. Maybe when you put your input and output variables in constant memory, I was planning on trying that out when I return at work, but then you cannot access blockIdx, blockDim and gridDim, so I think in practice it will never be possible to use all 16k.

I thought that parameters are placed in the registers and if regs are not enough, they would be placed in local memory. Which mem on earth are parameters placed in?

It’d be a huge waste of resources to pass parameters in registers. Imagine you have 512 threads and say 5 kernel arguments…
this is completely common. That’d take 2500+ of your GPU’s very very valuable registers! It’s a lot better to use 20 bytes of your shared memory.

I think it’d make more sense if parameters were in constant memory.

hear hear.

But then we would still not be able to use all of the shared memory. I think a little extra shared memory for blockdim, griddim and blockidx would be a great idea for GT300 ;)