do not understand bank conflicts please help

nuliknol · June 12, 2009, 10:03pm

Hi,

on page 89 in Cuda Programming Guide 2.2 there are two examples of bank conflicts:

__shared__ struct type shared[32];

struct type data = shared[BaseIndex+tid];

it says that if the data accesed based on tid and the type is declared as this, it will result in NO bank conflict

struct type {

	float x,y,z;

}

and if the type is declared as this it WILL have a bank conflict

struct type {

	float x,y;

}

what i do not understand , is what the differences between the two? why 3 consecutive floats in a struct will not have conflict but two will? As the width of the bank is 32 bits, all the floats will use a separate bank. So where is the conflict??

Thanks in advance, as always.

sergeyn · June 13, 2009, 6:40am

in 1st case compiler should be reading data as:
x.x = ((float*) shared)[tid * 3 + 0];
x.y = ((float*) shared)[tid * 3 + 1];
x.z = ((float*) shared)[tid * 3 + 2];

Imagine threads from the 1st half-wrap where tid ranges from 0 to 15, that means that we address following banks [((0…15)*3 + 0) % 16] = [0,3,6,9,12,15,2,5,8,11,14,1,4,7,10,13]. As you can see threads address different banks when reading ‘x’, same goes for ‘y’ and ‘z’

now 2nd case
x.x = ((float*) shared)[tid * 2 + 0];
x.y = ((float*) shared)[tid * 2 + 1];
Reading ‘x’ addresses following banks [((0…15)*2 + 0) % 16] = [0,2,4,6,8,10,12,14,0,2,4,6,8,10,12,14]. As you can see here, every other bank is referenced 2 times, giving you a 2-way bank conflict. Same happens when reading ‘y’.

nuliknol · June 13, 2009, 12:14pm

thanks!

gepolv · March 12, 2011, 6:50am

I fail to understand why it is (0…15) instead of (0…31) in “[((0…15)*3 + 0) % 16]”. Because in “__shared__struct type shared[32];”, there are 32 types. Am I understanding correctly?

Thanks.

Deryk.

tera · March 12, 2011, 12:38pm

At the time the original question was posted, all CUDA capable GPUs accessed shared memory in half-warps, i.e. threads 0…15 used one cycle and threads 16…31 used a second cycle. So there could never be any bank conflicts between the lower and the upper half-warp.

This is still true for devices of compute capability 1.x. On devices of compute capability 2.x however a full warp accesses shared memory in one cycle, so your understanding is correct there.

hajmola · November 2, 2011, 9:44am

Why the ‘modulo 16’ operation? Why the number ‘16’? Is it related to the number of threads in a half-warp which is equal to 16? Or has it anything to do with the total number of memory banks (…is that equal to 16?) ? The “[((0…15)*3 + 0) % 16]” thing suggests a layout of the memory which is equivalent to a matrix of 16 columns, with each column representing a ‘memory bank’, and addresses 0 to 15 belonging to row 1, addresses 16 to 31 belonging to row 2, and so forth (thus addresses 0,16,32… belong to column 0 and hence memory bank 0, addresses 1,17,33… belong to memory bank 1, and so forth). Is this understanding of memory bank correct?

L_F · November 3, 2011, 4:00pm

You are right, it relates to number of memory banks, 32 for now (NVIDIA CUDA Programming Guide, F.4.3. Shared memory has 32 banks…)

DariusUndertow · December 22, 2012, 5:25am

Major thanks to sergeyn.

I searched everywhere on the net for several hours and found no good answer. Stack Overflow, university lecture materials, GPU Gems 3 and not a single one of them clearly explained how exactly the bank conflict occurred. Yet in one small post, you illustrated the answer perfectly. Thank you SO much.

I always wonder why it’s so difficult for highly paid professional authors and professors to convey the most simple concepts.

Topic		Replies	Views
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2664	March 31, 2010
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	905	December 29, 2011
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3502	August 20, 2009
How to understand the bank conflict of shared_mem CUDA Programming and Performance	12	11950	January 16, 2025
bank conflicts...? CUDA Programming and Performance	2	3329	January 30, 2008
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6700	February 8, 2009
Bank Conflict when each thread accesses 2 elements CUDA Programming and Performance	8	5608	July 9, 2010
Question about bank conflict Chapter 5.1.2.4 of Cuda Prog. Guide CUDA Programming and Performance	2	2489	April 12, 2008
Shared memory coalescing Q:Is it a typo in the programming guide? CUDA Programming and Performance	2	2500	August 20, 2007
shared memory bank conflicts when reading? CUDA Programming and Performance	5	2568	August 3, 2007

do not understand bank conflicts please help

Related topics