I have Write a Cuda Kernel, with carefully shared memory arrangement，and it should be bank conflict free if my understanding is correct.
When using the Nsight Compute：
In the source code page， I checked the ‘L1 Wavefronts Shared’ and ‘L1 Wavefronts Shared Ideal’ ， values are all same in these 2 columns. Does this mean the source code achieved bank conflict free?
However, in the details page, It shows there exist bank conflicts when loading and storing shared memory.
And something more interesting， If I emit the Kernel with only one Block, that is, set GridDim.x, GridDim.y, GridDim.z to 1, it shows there are NO Shared load/store bank conflicts.
So, Is there REALLY exist bank conflicts or not? And Why there exists the gap in nsight compute ? How to narrow down the issue?