I am very new at cuda, and very confused about bank conflicts in shared memory. There are not many explanations about this, and not clear ones, either.
Maybe because I am new at this subject, I don’t understand the given examples by SDK. Thus, could anyone explain me how bank conflicts occur and how to prevent them in general in a more clear and simpler way?
The section 220.127.116.11 (Shared Memory) of the NVIDIA CUDA Programming Guide v2.0 gives a good explanation about how to achieve the peak performance using shared memory.
About bank conflicts, if you have n threads in your kernel, you might read (or write) to n sequential banks. Take a look on page 63 of programming guide. That figure show how you need to read from shared memory without conflicts. Your threads need to follow a pattern, with some restrictions. Hope this will help you.