Is there any way to find out the location in cuda code that cause shared memory bank conflicts?

tom.hx · January 11, 2022, 1:15pm

Hello,

I have Write a Cuda Kernel, with carefully shared memory arrangement，and it should be bank conflict free.

If I emit the Kernel with only one Block, that is, set GridDim.x, GridDim.y, GridDim.z to 1, and the block contains 2 warps with 64 threads, it shows there are NO Shared load/store bank conflicts by using nsight compute.

But things come different when the GridDim get larger (eg. GridDim.x = 3, GridDim.y = 4, GridDim.z = 256 ), nsight compute shows thousands of shared load/store bank conflicts.

So, How can I locate the code? Does Bank conflict occurs when different warp scheduler access shared memory at the same time ? Is there any way to avoid?

rs277 · January 11, 2022, 6:22pm

Robert’s reply here may help understanding the situation:

cecil.cj · January 12, 2022, 2:06am

In general shared memory bank conflicts can occur any time two different threads are attempting to access (from the same kernel instruction) locations within shared memory for which the lower 4 (pre-cc2.0 devices) or 5 bits (cc2.0 and newer devices) of the address are the same.

Wondering about how to avoid bank conflict from two different threads of different warps running on different processing blocks of a SM?

BTW, short scoreboard stall will (also) occur and be reported by NSight Compute, for bank conflict of this kind, am I right?

Robert_Crovella · January 20, 2022, 11:25pm

There is no such animal. Bank conflicts only occur in the context of a single instruction, issued to a single warp. Only threads in the same warp have the possibility to create bank conflicts.

nsight compute has a code (source, sass) page, that localizes reports to specific lines of code. This blog (all 3 parts) may help. Part 2, figure 5 shows an example of the source code page.

cecil.cj · January 21, 2022, 11:07am

Got it, then the strange part of @tom.hx’s example:

Grid size == 1, no bank conflict
Grid size == 3072, ~2-3 “bank conflicts per block”

Curious not a zero bank conflict count for 3072 blocks (assuming ALL warps have the same access pattern).

tom.hx · January 21, 2022, 11:11am

Thanks for your replying~

I did use the nsight-compute：

In the source code page， I checked the ‘L1 Wavefronts Shared’ and ‘L1 Wavefronts Shared Ideal’ ， values are all same in these 2 columns. Does this mean the source code achieved bank conflict free?

However, in the details page, It shows there exist bank conflicts when loading and storing shared memory.

And Why there exists the gap? How to narrow down the issue?

Robert_Crovella · January 21, 2022, 2:56pm

I probably wouldn’t be able to answer further questions without having the tool in front of me and access to the code to study. You may get better help on the nsight compute forum.

Topic		Replies	Views
Shared memory bank conflict CUDA Programming and Performance	4	4183	March 27, 2008
Shared memory bank conflicts CUDA Programming and Performance	1	2404	August 24, 2009
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	917	December 29, 2011
Problems about Profiling Shared Memory Bank Conflicts using nsight-compute Nsight Compute	2	1562	January 25, 2022
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6723	February 8, 2009
Shared memory bank conflicts? CUDA Programming and Performance	0	846	June 4, 2009
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3520	August 20, 2009
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2690	March 31, 2010
shared memory bank conflicts when reading? CUDA Programming and Performance	5	2580	August 3, 2007
Bank Conflicts and Serialized Warps CUDA Programming and Performance	6	7850	December 4, 2009

Is there any way to find out the location in cuda code that cause shared memory bank conflicts?

Related topics