bank conflict in cuda's parallel prefix scan

minidrive · February 12, 2016, 4:26am

I have a quick question regarding the “Parallel Prefix Scan with CUDA” paper. In it, you state briefly that “when multiple threads in the same warp access the same bank , a bank conflict occurs, unless all threads access the same address within the 32 bit word”.

From the website , "Shared memory banks are organized such that successive 32-bit words are assigned to successive banks and the bandwidth is 32 bits per bank per clock cycle. "

i am a little confused by this statement. Is an address not 32 bit on a 32 bit machine and 64 bits on a 64 bit machine. So, if the kernel had shared memory and each thread in the warp (the warp has 32 threads on my machine) was accessing a different address, it would be accessing an address on its very own bank and therefore there should be no conflict.

Am i missing something here?Unfortunately, i cannot make sense of the cuda docs to figure this one out… Please advise.

Robert_Crovella · February 12, 2016, 4:34am

memory addressing is done by bytes. 0 is byte 0, 1 is byte 1.

For 32-bit quantities, the first quantity would normally be located at address 0 (for a naturally aligned word), and the second quantity would be located at address 4.

Therefore a 32-bit word “contains” 4 addresses. For the 32-bit word at address 0, it “contains” byte addresses of 0,1,2, and 3.

This is the meaning of “the same address within a 32-bit word”

Bank 0 includes the byte addresses of 0,1,2,3, as well as 128,129,130, and 131, etc.
Bank 1 includes the byte addresses of 4,5,6,7, as well as 132, 133, 134, and 135, etc.

If thread 0 is accessing address 0 and thread 1 is accessing address 4, then there will be no bank conflicts between those two threads in that case.

If thread 0 is accessing address 0 and thread 1 is accessing address 132, then there will be no bank conflicts.

If thread 0 is accessing address 0 and thread 1 is accessing address 128, there will be a bank conflict.

I’m not sure what the “Parallel Prefix Scan with CUDA” paper is, but I would refer to the programming guide for shared memory details:

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-2-x[/url]

Topic		Replies	Views
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2629	March 31, 2010
Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x CUDA Programming and Performance	5	974	November 17, 2017
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	893	December 29, 2011
Shared memory bank conflicts with byte arrays CUDA Programming and Performance	4	3273	April 19, 2017
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3467	August 20, 2009
When bank conflicts in shared memory, serialized request is the order fixed? CUDA Programming and Performance cuda	4	26	August 12, 2024
How to understand the bank conflict of shared_mem CUDA Programming and Performance	12	9933	January 16, 2025
Does every thread block have its own 32 shared memory banks? CUDA Programming and Performance cuda	8	1624	February 6, 2023
No clear concise data on GPU shared memory bank layout CUDA Programming and Performance	3	286	May 17, 2024
Bank Conflicts and Serialized Warps CUDA Programming and Performance	6	7804	December 4, 2009

bank conflict in cuda's parallel prefix scan

Related topics