Local memory size Is it shared on multiprocessor?

When a multiprocessor executes multiple workgroups at one moment, the number of registers is shared between the workgroups, so with workgroup size 256, one workitem using 8 registers and total of 16384 registers can be 8 workgroups processed simultanely. Is this limited also by local (OpenCL local, CUDA shared) memory size? If I use all aviable local memory in each workgroup, does it mean that I limit the multiprocessor to process only single workgroup? The Best Practices Guide states that shared memory can “act as a constraint on occupancy”, but the example speaks about different problem.
Thank you for your insights.

Quite obviously, this is the case. See e.g. NVIDIA’s OpenCL Programming Guide (3.2, page 27 might be a good point to start).

Thanks again - as I have written above, the documentation (I have read again the section you mention) speaks still only about registers, not about shared memory, although it can be extrapolated even for that. I just wanted to be sure.

Programming Guide 3.2, 2.1.2 Hardware Multithreading (linked from section 3 when it comes to occupancy):

The documentation is not that bad after all, although there’s always room for improvement.