How can I use all 16384 bytes of shared memory?

I have a program, it needs to use all 16384 bytes of shared memory to reach its best performance.
I pass the parameters to kernel using constant memory, but it remains 8 bytes in shared memory for system use.

The ptxas option report as this:
ptxas info : Used 13 registers, 16384+16 bytes smem, 8 bytes cmem[0]
Entry function ‘_Z6testSMv’ uses too much shared data (0x4000 bytes + 0x10 bytes system, 0x4000 max)

How can I use all 16384 bytes of shared memory ?


Officially, you can’t. Some shared memory is always reserved for built-in variables. Unofficially, a couple of posters reported being able to use negative indexing to write over the built-ins and yield all 16k of shared memory on 1.1/1.3 capable hardware. Completely unsupported and liable to incur the “wrath of murray”, so be don’t say you weren’t warned…

There is no safe way to do this. There are block configuration parameters passed to the kernel which use shared memory as well. In the past, people have used tricks, like declaring a (16384 - 16) byte array, and then using negative indexing to use the first 16 bytes. However, this dangerous and not guaranteed to work.

Your other option is to pick up a GTX 470 or 480 when they hit the stores and get 48 kB of shared memory per multiprocessor.

Read here: