Problems about Profiling Shared Memory Bank Conflicts using nsight-compute

I have Write a Cuda Kernel, with carefully shared memory arrangement,and it should be bank conflict free if my understanding is correct.

When using the Nsight Compute:

  1. In the source code page, I checked the ‘L1 Wavefronts Shared’ and ‘L1 Wavefronts Shared Ideal’ , values are all same in these 2 columns. Does this mean the source code achieved bank conflict free?

  2. However, in the details page, It shows there exist bank conflicts when loading and storing shared memory.

  3. And something more interesting, If I emit the Kernel with only one Block, that is, set GridDim.x, GridDim.y, GridDim.z to 1, it shows there are NO Shared load/store bank conflicts.

So, Is there REALLY exist bank conflicts or not? And Why there exists the gap in nsight compute ? How to narrow down the issue?

If Nsight Compute is showing bank conflicts in the Memory Workload Analysis tables, there are truly conflicts in your kernel. The Source page metrics you referred can help in identifying the source of such conflicts, but they are not guaranteed to show all of them (i.e. there is no strict correlation in both directions).

You can find more info related to bank conflict analysis in Nsight Compute in this thread: Shared memory bank conflicts and nsight metric. Note that the “Memory L1 (Ideal) Transactions Shared” have since been renamed to “L1 Wavefronts Shared (Ideal)” in newer versions of the tool. Also, as hinted to in this reply, we are looking to make it easier to determine the source of any bank conflicts in future versions of the tool.

I see,Thank you very much.