Uncoalesced Shared Accesses

paoxiaode · September 5, 2023, 7:34am

Hi all, I am working on optimizing kernel performance by nsight compute, I meet some problems and look for any advise.

I use nsight compute to profile the kernel preformance, and the report warn that it has Uncoalesced Shared Accesses,

However for kernel implement, the thread block shape is 32 * 32, for all threads in each warp, they will access the same address in shared memory, the addresses accessed between warps are neighboring.

I think this access pattern is broadcast, I don’t understand why this kernel has uncoalesced Shared Accesses.

Hope for any advise!

Robert_Crovella · September 5, 2023, 2:34pm

If you don’t wish to provide the code example, my suggestion would be to ask this question on the nsight compute forum.

Greg · September 6, 2023, 4:10am

If you check the L1 Wavefronts Shared Excessive table. specified by the rule, do you find that these specific loads are all access the same address?

Shared memory access patterns are only relevant for each warp instruction. The accesses between warps is not material.

The counter listed as showing shared memory bank conflicts can increment for reasons other than a shared memory bank conflict. As such the recommendation is to follow the rule and check in the Source View to determine if the column L1 Wavefronts Shared Excessive (NOTE: Name was slightly different on older versions of Nsight Compute). The value is this column is calculated only from the memory addresses passed to the instruction.

Topic		Replies	Views
What is 'uncoalesced shared accesses' CUDA Programming and Performance	3	581	August 27, 2024
Uncoalesced shared access CUDA Programming and Performance	10	203	March 11, 2025
Confirm that coalescence does not matter for __shared__ access? CUDA Programming and Performance	3	318	November 20, 2023
Analyzing bank conflicts with Nsight compute CUDA Programming and Performance	1	2372	August 14, 2020
Shared memory bank conflict CUDA Programming and Performance	4	546	July 30, 2025
Questions about "L1 Conflicts Shared N-way" & metrics related to "Excessive" CUDA Programming and Performance	6	694	July 1, 2025
Low Shared Memory Efficiency when all threads in a warp read the same shr. mem. location CUDA Programming and Performance	0	540	April 2, 2020
questions about coalescing access coalescing access CUDA Programming and Performance	8	2089	November 13, 2009
Do these two global memory coalesced access pattern have same performance in thoery? CUDA Programming and Performance cuda	3	381	December 17, 2022
Profiling coalesced memory accesses confusion Nsight Compute	2	706	October 12, 2021

Uncoalesced Shared Accesses

Related topics