Bank conflicts with 2D shared mem array Resolving bank conflicts

Squall211 · July 18, 2008, 7:46pm

Hi!

I’m having a bit of a problem trying to figure out what to do with the vast amount of bank conflicts with my code:

__shared__ float f1Patch[16][16];

f1Patch[threadIdx.x][threadIdx.y] = some value;

Each one of my blocks has 16x16 threads, so each thread should load one element from global mem into the shared memory.

I think I understand why I have bank conflicts, the first 16 writes (ie: y = 0) should be ok, but each subsequent row will attempt to access the same bank as all of the other rows.

Is there anything I can do to optimize this? I’m trying to grab a square patch of pixels from an image for processing.

Thanks in advance!

kristleifur · July 18, 2008, 9:30pm

Have a look at the Transpose example. They’re grabbing a square block, and there’s bank conflicts when the rows are read. They make the shared array like this:

shared[16][16 + 1]

I think this staggers the accesses to the rows … You use 16 positions, but there’s cycles of 17.

Makes sense / is applicable?

Topic		Replies	Views
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6865	February 8, 2009
example project "transpose" CUDA Programming and Performance	1	2054	March 13, 2009
Bank Conflict when each thread accesses 2 elements CUDA Programming and Performance	8	5739	July 9, 2010
Avoiding Bank Conflicts in convolution CUDA Programming and Performance	3	3066	December 3, 2009
Write/read shared memory on compute capability 2.1 CUDA Programming and Performance	3	998	November 21, 2012
Shared memory bank conflict CUDA Programming and Performance	2	1650	October 1, 2021
How to understand the bank conflict of shared_mem CUDA Programming and Performance	16	16343	November 19, 2025
Bank Conflicts CUDA Programming and Performance	2	2033	December 6, 2009
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	14	3727	November 20, 2025
Will this code cause bank conflict ? CUDA Programming and Performance	1	498	October 9, 2018

Bank conflicts with 2D shared mem array Resolving bank conflicts

Related topics