Is there a way to check what is the current local memory (i.e. shared memory) bank configuration in OpenCL? By this I mean whether successive 32-bit words or 64-bit words are assigned to successive banks. I know that in CUDA I can set the desired bank configuration using cudaDeviceSharedMemConfig() cudaDeviceSetSharedMemConfig() function. Can I access the aforementioned function through OpenCL? How?
Another related question: The CUDA programming guide tells us that in Fermi Kepler GPUs, each bank has a bandwidth of 64 bits per clock cycle. Is this also true when the local memory is in the 32-bit mode? Based on my experience, it appears that the default is 32-bit mode and each bank has a bandwidth of 32 bits per clock cycle. Have others had similar experiences? ADD: To clarify, I am trying to estimate what is the maximum theoretical local memory bandwidth and I am wondering what happens, for example, when two threads from two different warps are simultaneously accessing 32-bit words from the same memory bank.
ADD: I tried to call cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte) inside the OpenCL code and cudaDeviceGetSharedMemConfig() claimed that everything went ok. However, this had no effect on the local memory bandwidth. I am still measuring about 1250 GB/s which is less than half of what I would expect.