Hi zhaopeng_eng,
there is indeed a difference in the reported bank conflicts on the Source Page versus the Details Page.
On the Source Page the reported bank conflicts solely originate from the memory access pattern of the corresponding source line. For every executed shared memory access, we calculate the conflicts within the warp due to the access pattern for the active threads of the warp. For your updated code sample, this is now reduced to zero.
The reported bank conflicts on the Details Page include all these conflicts plus additional conflicts that are caused by multiple clients trying to access the memory banks at the same time. For more details, please also have a look at How to Understand and Optimize Shared Memory Accesses using Nsight Compute | NVIDIA On-Demand. The difference and root cause are briefly discussed around minute 21 in the recording. In short, as the L1 Cache and Shared Memory are both backed by the same physical memory banks, there may be additional conflicts across warps from different clients accessing this physical memory. The numbers on the Details Page include these additional conflicts.