How to get rid of local memory usage ?

I used global memory instead of texture memory to store the input image data. When using texture, each thread takes 34 registers, and no local memory, occupancy is 13%; when using global memory, however, each thread uses only 17 registers, but a considerable amount of local memory is used, and occupancy increases to 26%, however, the total perf. goes down, which I guess due to local memory usage.
Is there a way to set some constraints in compiler , to optimize for local memory, instead of registers? In Visual Studio enviroment.


I’m not sure I’ve correctly understood this, but have you looked at using “custom” caching of global memory with shared memory? (Using the threads to do a coalesced read from global into shared and using shared mem for processing.) The convolutionSeparable sample project does this IIRC.

I’m thinking about the implicit caching in a texture vs. the relatively slow reads from global memory. The slow global reads are “fixable” with shared memory once you figure out a good caching mechanism using shared mem. I think that would be my main focus in that situation, rather than register / occupancy issues. (It actually is, in my project, but I’m prepared to be mistaken.)

Just my two cents. Good luck!