C1060 and Shared Memory size

According to specification of the C1060, it has 16KB of shared memory. I has a kernel that need about 9728 byte of shared memory.

I am using cuda driver API and unfortunately the program die with CUDA_ERROR_INVALID_VALUE on cuFuncSetSharedSize(…), if i reduce the ammount of memory from 9728 to 4096 the call succed without problems, also kernel execution with wrong shared memory size do not give any error. i has also tried on Fermi devices (S2050, C2050, GT430), with 9728 byte, on these cuFuncSetSharedSize(), do not give any error

I am missing something ?

… edited

I have done 3 topic, it seem no one has seen my topic, i have founded the answer for this question, so i answer to myself , may be this forum can start to be useful for who has this problems. First of all CUDA SDK documentation spent just two world on “static” shared memory and “dynamic” shared. this one

“In this sample, shared memory is statically allocated within the kernel as opposed to allocated at runtime through cuFuncSetSharedSize().”

on the other hand to do a little bit confusing the doc say

cuFuncSetSharedSize() sets the size of shared memory for the function. (static ??? dynamic ??? )

the difference from static and dynamic is explained (badly, no mention to cuFuncSetSharedSize for driver case, the link is just the words “dynamical allocated” used at the end of section ) in appendix B.2.3

Finaly to do the things more unclear, the example matrixMulDrv set cuFuncSetSharedSize even if the memory seem to be statically allocated.

The answer is, do not follow the NVIDIA examples, do not use cuFuncSetSharedSize for “static” memory shared , use only for dynamic shared, as explained in appendix B.2.3.