Shared memory bank conflicts and nsight metric

Actually as Instructions and Wavefronts are independent counters and they differ by roughly # of conflicts, I guess this is probably some strange but real phenomenon being accurately captured by counters. But I don’t see why this would happen. Even if something weird happened when crossing the 32k to 64k boundary, each warp’s access is on one side of the boundary. Unless there was some TLB-like thing for the two possible partitions of shared memory and it’s being thrashed. When I make buffer 4096, so each block’s whole allocation is on one side or the other of the boundary, not just the access, it doesn’t help any. Seeing conflicts is very sensitive to # of threads, size of buffer, etc., but I didn’t pull out any additional pattern.