Efficient use of a shared data

BehzadX · March 31, 2012, 6:46pm

Dear all,

I am using finite element method to simulate fluid flows. Here I defined the vector of field variables named as Q. Then for integration procedures, done by thread utilization, that mentioned vector should be shared among the threads of a block on GPU. Then how is the best way to optimize the memory transactions? In the case of shared memory, what is the way to avoid bank conflicts while the number of threads in block is much greater than the size of the shared data e.g. Q[16] and threads per block = 128 ?

Many thanks,
BehZad

pasoleatis · April 1, 2012, 7:26am

The conflicts occur per warps. So you only have to worry about conflicts inside the warp. If the threads in a warp read consecutive elements of an array there are no conflicts. In the Fermi architecture there are 32 banks not 16.

Topic		Replies	Views
Shared Memory Bank Conflict Clarification CUDA Programming and Performance	2	772	April 16, 2011
bank conflict in fermi for doubles CUDA Programming and Performance	0	1243	June 17, 2010
Shared memory bank conflict CUDA Programming and Performance	2	1595	October 1, 2021
Bank conflicts with 2D shared mem array Resolving bank conflicts CUDA Programming and Performance	1	2012	July 18, 2008
Shared memory: Optimizing vectorized accesses vs bank conflicts CUDA Programming and Performance	4	202	August 2, 2024
Shared memory bank conflicts CUDA Programming and Performance	1	2387	August 24, 2009
writing to an array of 64 ints CUDA Programming and Performance	4	2336	March 3, 2008
Shared memory bank conflicts with byte arrays CUDA Programming and Performance	4	3275	April 19, 2017
Bank Conflicts CUDA Programming and Performance	2	1962	December 6, 2009
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	893	December 29, 2011

Efficient use of a shared data

Related topics