trading memory to negate bank conflicts

little_jimmy · June 2, 2015, 5:49am

hello,

matrix transposes are one example where memory footprint is traded for negating bank conflicts - a greater array size is used to prevent bank conflicts

if such a trade-off is indeed possible, would i be able to trade memory footprint to prevent bank conflicts in the following case, and what is the required memory increment?

a kernel calculates type double data for a subsequent kernel to use
4 doubles are calculated over the course of the kernel, and must likewise be packaged into groups - packets - of 4 doubles for the next kernel; hence, the kernel must write doubles from local memory to shared memory (to later write to global memory) with a stride of 4

according to my calculations, writing single doubles with a stride of 4, every 4 threads within a half warp would experience a bank conflict, and incrementing the address by 1 (double) every 4 threads would remove the bank conflict

x = (4 * threadIdx.x) + (threadIdx.x / 4) + y; y = [0, …, 3]
shared = local_double;

not so?

assume 32-bit mode, not 64-bit mode

Topic		Replies	Views
How to understand the bank conflict of shared_mem CUDA Programming and Performance	16	16418	November 19, 2025
The question of the example of "3.2.2.3 Shared Memory in Matrix Multiplication(C=A*A(T)" i CUDA Programming and Performance	0	1937	September 17, 2009
bank conflict in fermi for doubles CUDA Programming and Performance	0	1278	June 17, 2010
Resolve 1D shared memory bank conflict with paddling CUDA Programming and Performance cuda , kernel	9	425	September 1, 2024
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6867	February 8, 2009
Ncu detects bank conflicts in matrix transposition after padding Nsight Compute cuda	5	1543	January 30, 2023
How to elegantly handle double arrays in shared memory without inducing bank conflict? CUDA Programming and Performance	1	508	July 28, 2019
Bank Conflict when each thread accesses 2 elements CUDA Programming and Performance	8	5743	July 9, 2010
example project "transpose" CUDA Programming and Performance	1	2055	March 13, 2009
Shared memory bank conflict CUDA Programming and Performance	2	1650	October 1, 2021

trading memory to negate bank conflicts

Related topics