http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html
http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/scan/doc/scan.pdf
which one is correct?
In the first link “example 39-3”, padding is computed as follows:
#define NUM_BANKS 16
#define LOG_NUM_BANKS 4
#define CONFLICT_FREE_OFFSET(n) \
((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS)) // I totally don't understand this one.
In the second link “listing 2”, padding is computed as
#define NUM_BANKS 16
#define LOG_NUM_BANKS 4
#ifndef ZEOR_BANK_CONFLICTS
#define CONFLICT_FREE_OFFSET(n) \
((n) >> NUM_BANKS + (n) >> (2 * LOG_NUM_BANKS))
#else
#define CONFLICT_FREE_OFFSET(n) \
((n) >> LOG_NUM_BANKS)) // I think only this one is correct, right?
#endif