I’m trying to understand the relationship between l1tex__data_bank_conflicts_pipe_lsu_mem_shared_op_ld and l1tex__data_pipe_lsu_wavefronts_mem_shared_op_ld metrics for shared memory accesses.
My test case: A kernel with LDS.128 instructions (each thread loads 16 bytes = float4). I have two versions with different shared memory layouts:
| Metric | v10 (bad layout) | v11 (optimized layout) |
|---|---|---|
sass__inst_executed_shared_loads |
8,844,480 | 8,844,480 |
l1tex__data_pipe_lsu_wavefronts_mem_shared_op_ld |
283,023,360 | 35,377,920 |
l1tex__data_bank_conflicts_pipe_lsu_mem_shared_op_ld |
247,645,440 | 0 |
| Wavefronts per instruction | 32 | 4 |
My interpretation:
For LDS.128 with 32 threads, the minimum wavefronts required is 4:
- 32 threads × 16 bytes = 512 bytes per instruction
- Shared memory services 128 bytes per wavefront
- 512 / 128 = 4 wavefronts minimum
The bank_conflicts metric counts excess wavefronts beyond this minimum:
- v10: 283,023,360 - 35,377,920 = 247,645,440 (matches
bank_conflicts) - v11: 35,377,920 - 35,377,920 = 0 (matches
bank_conflicts)
Questions:
-
Is this interpretation correct? The 4 wavefronts in v11 represent full bandwidth utilization (not conflicts), while the additional 28 wavefronts per instruction in v10 are caused by bank conflicts and counted in the
bank_conflictsmetric? -
Put another way: when threads access different banks and we achieve the minimum 4 wavefronts for LDS.128, this is NOT counted as a bank conflict because we’re fully utilizing available bandwidth?
-
Therefore, it would be inaccurate to say that bank conflicts are “multiple threads within a warp accessing the same bank in a single instruction” - since in v11, threads ARE accessing the same banks within a single instruction (e.g., threads 0 and 8 both access banks 0-3), yet the metric reports 0 conflicts. Would it be more accurate to say a bank conflict is: excess wavefronts above the theoretical optimum, caused by multiple threads within a warp accessing the same bank?
Thanks for any clarification!
GPU: Blackwell (SM 10.0)