Bank conflicts on same address

elwavero · April 27, 2009, 3:35am

Section 5.1.2.5 of the CUDA 2.1 Programming Guide states "“any memory read or write request made of n addresses that fall in n distinct memory banks can be serviced simultaneously”. Does a bank conflict occur if multiple threads read from two different addresses? For example, does a conflict occur if threads 0 and 1 read from foo[0] and threads 2 and 3 read from foo[1]? This is only two unique addresses (foo[0] and foo[1]) and each address falls in a different bank (assuming foo is an array of ints).

Quoc_Vinh · April 27, 2009, 6:18am

of course, Bank conflict occurs. because thread 0, 1 and thread 2, 3 are the same halfwarp.

Thread 0, 1 access same bank (which allocate foo[0]), and Thread 2, 3 access same bank (which allocate foo[1]).

elwavero · April 27, 2009, 5:36pm

But the four threads read a total of two addresses from two banks (n=2). Threads 0 and 1 both read foo[0] (the same address) and threads 2 and 3 both read foo[1] (the same address). Why does it cause a bank conflict if the same exact address is being read? (I understand why a conflict would occur if multiple threads read from two or more addresses in the same 32-bit word. This is the reason for the broadcast mechanism.)

tbradley · April 27, 2009, 5:58pm

See figure 5.8 in the same manual for your answer - assuming the other threads in the half-warp don’t cause conflicts then there will be no conflicts.

elwavero · April 27, 2009, 7:31pm

I’m not sure that either diagram (left or right) in figure 5.8 depicts the scenario I am describing. In the left diagram, all threads in the half warp read from the same 32 bit word (a single word). As a result, the broadcast mechanism can service all of the requests at once. The right diagram depicts a scenario in which there is one set of threads reading from the same word, and all other threads read from words in different banks. The guide states that which word is selected by the broadcast mechanism is unspecified and can’t be predetermined. As a result, the scenario will cause a bank conflict if bank 5 is not selected as the broadcast word.

Both of these scenarios are different than what I am describing. My scenario would look like:

thread 0 → foo[0] (bank 0)

thread 1 /

thread 2 \

thread 3 → foo[1] (bank 1)

Quoc_Vinh · April 28, 2009, 12:55am

For easy to understand.

Thread0 accesses bank0 (foo[0]), and this time thread1 wants to access bank0 (foo[0]) too.

with this situation, Cuda driver allows thread0 access bank0 and thread1 must wait until thread0 is finish → warp_serial occurs (bank_conflict).

Same for thread2 and thread3.

Topic		Replies	Views
bank conflict in cuda's parallel prefix scan GPU-Accelerated Libraries	1	1899	February 12, 2016
shared memory bank conflicts when reading? CUDA Programming and Performance	5	2568	August 3, 2007
bank conflict question CUDA Programming and Performance	3	2300	December 28, 2009
Bank Conflict when each thread accesses 2 elements CUDA Programming and Performance	8	5608	July 9, 2010
Problems Understanding Bank Conflicts CUDA Programming and Performance	1	1727	September 16, 2009
How to understand the bank conflict of shared_mem CUDA Programming and Performance	12	11963	January 16, 2025
When bank conflicts in shared memory, serialized request is the order fixed? CUDA Programming and Performance cuda	4	41	August 12, 2024
do not understand bank conflicts please help CUDA Programming and Performance	7	2714	December 22, 2012
CUDA Reduction CUDA Programming and Performance	2	1759	March 1, 2009
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	905	December 29, 2011

Bank conflicts on same address

Related topics