Post Content
Hello everyone,
I am currently using Nsight Compute (version 2025.3.1.0) on an H100 GPU to analyze and optimize shared memory bank conflicts within a CUDA kernel.
During my analysis, I’ve become confused about the bank conflict statistics presented on two different pages (Details vs. Source), and I’m hoping to get some clarification.
My Analysis Method:
I have been primarily relying on the L1 Wavefonts Shared Excessive metric in the Source Page (SASS view). My understanding is that this metric helps me pinpoint the specific SASS instructions (usually LDS) where wavefronts are experiencing significant bank conflicts when accessing shared memory.
Question 1: Is using L1 Wavefonts Shared Excessive the correct method to identify the specific instructions causing problematic conflicts?
The Discrepancy I Observed:
My main confusion arises from this observation: After optimizing my code, I have successfully reduced the L1 Wavefonts Shared Excessive metric to zero for all instructions on the Source Page.
However, when I switch back to the Details Page and inspect the Load/Store Instructions section, I find that the corresponding LDS (Load Shared) or STS (Store Shared) instructions still show non-zero values in the Bank Conflicts column (or Shared Memory Conflicts column).
My Specific Questions:
This leads me to question the exact definition of the statistics on the Details page:
Question 2: What other types of conflicts are included in the Bank Conflicts count on the Details Page? Since the L1 Wavefonts Shared Excessive on the Source page is zero, the Details page count must be measuring something in addition to “excessive” conflicts. What else does it include?
Question 3: How should I accurately interpret the metrics under Memory Analysis → L1 Cache → Shared Memory Conflicts in the Details Page? I notice categories here like Bank Conflicts and Other Conflicts.
-
How is the main
Bank Conflictsvalue calculated? -
Specifically, what does the
Other Conflictscategory represent? I’ve seen thisOthercategory account for a high percentage in some of my tests.
Question 4: My ultimate goal is to completely eliminate bank conflicts. If the Source Page (L1 Wavefonts Shared Excessive) already shows zero, do I still need to be concerned about the non-zero Bank Conflicts count shown for LDS/STS instructions on the Details Page? If so, how can I pinpoint their cause?
Summary: I am trying to understand the precise difference and relationship between the Details Page and the Source Page when reporting shared memory bank conflicts in Nsight Compute for the H100 architecture. Any insights on how to correctly use these metrics for optimization would be greatly appreciated.
Thank you!