I have read that the L1 cache size in the fermi architecture ( GTX series in my case ) is configurable and from the 64KB available it can be programmed as 48 KB for shared memory and 16 KB for L1 cache or vice versa. I could not find a compiler flag or a command option to achieve this in OpenCL. Is this option not available in OpenCL ? Kindly clarify …
I was also wondering about this. Especially as the OpenCL Programming Guide for the CUDA Architecture explicitly states this possibility in section C.4.1, sadly leaving out the detail of how to utilize it.
I am very much interested in this too. Should you come across the solution, please do post it. Thx.
I’m interested too. I would like to test how my OpenCL kernels work with a 48KB-L1Cache configuration, but I didn’t found any solution.
Could you please give us the description you found in the OpenCL Programming Guide for the CUDA Architecture?
All I could find in the section C4.1 around L1 cache is:
[indent]There is an L1 cache for each multiprocessor and an L2 cache shared by all multiprocessors, both of which are used to cache accesses to local or global memory, including temporary register spills.
It does not point out any opencl APIs…
Sorry, I referenced the wrong section. In section 3.3 it states:
It does not explicitly mention the OpenCL-API for that. But as it is the OpenCL Programming Guide, why else should it mention that capability at all?