How to explain this bank conflict

ndzuser · September 20, 2013, 3:40am

I have a cuda program that has the data stored in shared memory of two dimensional array of float[8*8][17]. Data are read in different intervals from the shared memory and calculated, then stored back to the shared memory. The reported stats are as follows:

smem load transactions/request: 1.88
smem store transactions/request: 1.72
bank conflict per request: 0.09
replay overhead: 4.51%
smem achieved bandwidth: 290GB/s

I don’t think there should be any bank conflicts though. The “useful” data is actually in a 3D dimension of 8x8x16, I padded it to 8x8x17 to avoid bank conflict (stored in a two dimensional array of 64x17). There are 64 threads in the block and they are basically reading a 4x16 (4 on the horizontal side and 16 on the depth side) region out of the 3D cube at the same time. For this 4x16 region, each item on the depth direction has address of difference 1 (in 4-bytes, or 4 in bytes), and each item in the horizontal direction has address of difference 17*8. In other words, the addresses in this 4x16 region looks like this:

s+0 178+s 1782+s 1783+s
s+1 178+s+1 1782+s+1 1783+s+1
…
s+15 178+s+15 1782+s+15 178*3+s+15

where s could be a number between 0 to 7. I don’t see there is a way that the address difference could be multiples of 32, or am I missing something?

allanmac · September 20, 2013, 5:26am

What GPU (or architecture) is this kernel running on?

Topic		Replies	Views
How to understand the bank conflict of shared_mem CUDA Programming and Performance	12	9517	January 16, 2025
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2625	March 31, 2010
About bank conflict of shared_mem CUDA Programming and Performance	2	461	July 25, 2023
do not understand bank conflicts please help CUDA Programming and Performance	7	2689	December 22, 2012
smem bank conflicts CUDA Programming and Performance	4	5039	September 30, 2008
Unexpected shared memory bank conflict. CUDA Programming and Performance	2	1001	July 3, 2019
Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x CUDA Programming and Performance	5	974	November 17, 2017
bank conflict in cuda's parallel prefix scan GPU-Accelerated Libraries	1	1889	February 12, 2016
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3466	August 20, 2009
No clear concise data on GPU shared memory bank layout CUDA Programming and Performance	3	282	May 17, 2024

How to explain this bank conflict

Related topics