In Nsight Compute you first want to determine if the bank conflicts are a performance limiter. This can be observed in two different ways:
-
In the GPU Speed of Light section determine if L1/TEX Cache [%] (l1tex__throughput.avg.pct_of_peak_sustained_active) is one of the highest values. If it is then look at the SOL Memory Breakdown for SOL L1: *. Bank conflicts require additional data bank reads. If SOL L1: Data Bank Reads [%] (l1tex__data_banks_reads.avg.pct_of_peak_sustained_elapsed) is high or SOL L1 : Data Bank Writes [%} (l1tex__data_bank_writes.avg.pct_of_peak_sustained_elapsed) then reducing bank conflict could help.
-
The other method is to see how inefficient shared memory accesses are stalling warps by looking at the Warp State Statistics section. Shared memory accesses can have two impacts on warp state. The additional cycles to process shared bank conflicts can cause warps to be stalled on issuing instructions to the Load Store Unit in MIO (Memory Input/Output) partition. In this case the warp will report Stall MIO Throttle (smsp__average_warps_issue_stalled_mio_throttle_per_issue_active.ratio). The additional cycles to process shared bank conflicts increases the access latency. In this case the warp will report Stall Short Scoreboard (smsp__average_warps_issue_stalled_short_scoreboard_per_issue_active.ratio) on the instruction waiting for the shared memory data. If either of these reasons are high then it may be worth fixing the shared memory accesses.
The Source Page can help identify bank conflicts. Open the Source Page (top left drop down). in the source page change the navigation column from Instructions Executed to Memory L1 Transactions Shared. This value includes bank conflicts. You can then navigate through the highest values using the buttons to the right of the selection. What you are looking for is the rows with the highest Memory L1 Transactions Shared and the largest difference between Memory L1 Transactions Shared and Memory Ideal L1 Transactions Shared.