Clarification: bank_conflicts metric vs wavefronts for shared memory LDS.128

I’m trying to understand the relationship between l1tex__data_bank_conflicts_pipe_lsu_mem_shared_op_ld and l1tex__data_pipe_lsu_wavefronts_mem_shared_op_ld metrics for shared memory accesses.

My test case: A kernel with LDS.128 instructions (each thread loads 16 bytes = float4). I have two versions with different shared memory layouts:

Metric v10 (bad layout) v11 (optimized layout)
sass__inst_executed_shared_loads 8,844,480 8,844,480
l1tex__data_pipe_lsu_wavefronts_mem_shared_op_ld 283,023,360 35,377,920
l1tex__data_bank_conflicts_pipe_lsu_mem_shared_op_ld 247,645,440 0
Wavefronts per instruction 32 4

My interpretation:

For LDS.128 with 32 threads, the minimum wavefronts required is 4:

  • 32 threads × 16 bytes = 512 bytes per instruction
  • Shared memory services 128 bytes per wavefront
  • 512 / 128 = 4 wavefronts minimum

The bank_conflicts metric counts excess wavefronts beyond this minimum:

  • v10: 283,023,360 - 35,377,920 = 247,645,440 (matches bank_conflicts)
  • v11: 35,377,920 - 35,377,920 = 0 (matches bank_conflicts)

Questions:

  1. Is this interpretation correct? The 4 wavefronts in v11 represent full bandwidth utilization (not conflicts), while the additional 28 wavefronts per instruction in v10 are caused by bank conflicts and counted in the bank_conflicts metric?

  2. Put another way: when threads access different banks and we achieve the minimum 4 wavefronts for LDS.128, this is NOT counted as a bank conflict because we’re fully utilizing available bandwidth?

  3. Therefore, it would be inaccurate to say that bank conflicts are “multiple threads within a warp accessing the same bank in a single instruction” - since in v11, threads ARE accessing the same banks within a single instruction (e.g., threads 0 and 8 both access banks 0-3), yet the metric reports 0 conflicts. Would it be more accurate to say a bank conflict is: excess wavefronts above the theoretical optimum, caused by multiple threads within a warp accessing the same bank?

Thanks for any clarification!

GPU: Blackwell (SM 10.0)

cc @Robert_Crovella

Your understanding is correct. Bank conflicts occur within a 128-byte transaction / wavefront.

For LDS.128 , the minimum number of wavefronts is 4.

1 Like