About avoid shared memory bank conflicting

fatalme · October 6, 2013, 4:51pm

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/scan/doc/scan.pdf

which one is correct?

In the first link “example 39-3”, padding is computed as follows:

#define NUM_BANKS 16  
#define LOG_NUM_BANKS 4  
#define CONFLICT_FREE_OFFSET(n) \  
    ((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS))  // I totally don't understand this one.

In the second link “listing 2”, padding is computed as

#define NUM_BANKS 16  
#define LOG_NUM_BANKS 4
#ifndef ZEOR_BANK_CONFLICTS
#define CONFLICT_FREE_OFFSET(n) \  
    ((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS))
#else
#define CONFLICT_FREE_OFFSET(n) \  
    ((n) >> LOG_NUM_BANKS))    // I think only this one is correct, right?
#endif