I am desperate for more Shared Memory.
On 2.x architecture, each thread-block has 64kb memory that can be divided as 48kb shared / 16 kb L1 cache or vice-versa.
Other posts in this forum have claimed/asked/noted that the nvcc compiler flag “-Xptxas -dlcm=cg” will “disable the L1 cache line”. The PTX-ISA-3.1 Reference guide says
“.cg
Cache at global level (cache in L2 and below, not L1).
Use ld.cg to cache loads only globally, bypassing the L1 cache, and cache only in the L2 cache. As a result of this request, any existing cache lines that match the requested address in L1 will be evicted.”
Does this mean that all 64 kb will be available for Shared Memory? I would be very happy if this was the case!