Shared Memory Bank conflicts with 64 bit data type

jomivaan · May 23, 2024, 10:49am

Hello,
I am working with 64 bit unsigned integers and the memory accesses I am doing resulted in a lot more “L1 Wave excessive Fronts” than I expected.
My memory access pattern would have Thread i accessing two shared memory values at the index i and the index i+BlockDim.x (512 in my test case) of the shared memory array.
From what I have read, since the data type that I am working with is 64 bits, each number will span two shared memory banks which will make the shared memory accesses serialized. Is that correct?
If it is indeed correct, is there any way to have a better access pattern? I have seen padding suggestions but I don’t see how that would help since to me it just looks like I would shift the conflicts. I use an array but a 2D array can also be used if it somehow helps.
Thank you and sorry if my question is confusing

Greg · May 23, 2024, 5:43pm

For LDS.64 (Load 64-bit from Shared) or STS.64 (Store 64-bit to Shared) with all threads active predicated on and the address pattern be consecutive 64-bit values will be 2 wavefronts. The Source column L1 Wavefronts Shared Ideal should report 2 per instruction executed and the column L1 Wavefronts Shared Excessive will be 0 per instruction executed.

The return B/W of shared memory on Volta+ is 128B/cycle with an exception on broadcast that can emulate higher bandwidth. In general, if all 32 threads are active and predicated on then the instruction will be split into 2 separate wavefronts to support the return bandwidth.

jomivaan · May 24, 2024, 2:30pm

Hello,
Yes, that is the results I got. I tried to use a 2D array and the bank conflicts disappeared, the access pattern stayed the same. What is the difference between using the 1D array or the 2D array in terms of bank conflicts ?
Thank you

Greg · May 24, 2024, 4:15pm

A minimal viable reproducible will be required to investigate further.

veraj · June 28, 2024, 11:01am

Hi, @jomivaan

Please provide a minimal repro if you need further support. Thanks !

jomivaan · June 29, 2024, 10:54am

Sorry for not answering, I solved the issue myself.

veraj · June 30, 2024, 12:30pm

Thanks for the reply !

Topic		Replies	Views
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3467	August 20, 2009
Shared memory: Optimizing vectorized accesses vs bank conflicts CUDA Programming and Performance	4	176	August 2, 2024
Requesting clarification for Non contiguous shared memory access by threads of a warp with no bank conflicts CUDA Programming and Performance hw , cuda	5	394	February 21, 2024
Understanding the behaivor of ldmatrix in terms of shared memory access CUDA Programming and Performance cuda	2	1432	January 12, 2024
Basic question on array in shared memory CUDA Programming and Performance	12	8087	December 7, 2009
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2626	March 31, 2010
Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x CUDA Programming and Performance	5	974	November 17, 2017
Shared memory bank conflict CUDA Programming and Performance	1	283	May 19, 2024
Difference in number of wavefronts for strided access to shared-memory and L1 cache in Ampere GPUs Nsight Compute hw	1	818	February 16, 2023
Requesting clarification for Shared Memory Bank Conflicts and Shared memory access? CUDA Programming and Performance hw , cuda	11	3991	January 23, 2024

Shared Memory Bank conflicts with 64 bit data type

Related topics