I profiled my customized GEMM kernel, and in the detailed page, Nsight Compute told me that there’s no bank conflict. But in the source view, it told me that the LDGSTS instruction has 50% of excessive shared wavefronts. So I wonder is it possible that excessive wavefronts originate from sources other than bank conflicts?
Seems similar problem to this post: Questions about "L1 Conflicts Shared N-way" & metrics related to "Excessive", but no one explains this problem clearly.