Shared memory banks and Warp size New Warp Size in the Future?

The way that I understand shared memory bank conflicts is that each thread in a half warp should be accessing a different bank. There are 16 memory banks and 32 threads in a Warp.

The part that is bothering me is what happens in the future if the Warp size changes.

1 - Is it possible for the Warp size to go larger or smaller for future processors?

2 - If smaller, let’s say a new Warp size of 16, will the rule remain that there are no bank conflicts for threads within the half warp, i.e., 8 in this case, or will it remain at 16?.

I remember having read that it might become a 32 way bank. So you should prevent conflicts in a full warp.

But best would be to have some #define WARP_SIZE and #define BANK_SIZE and have your code using these defines to avoid bank conflicts and to optimize your code.

In the future the warp size may change, as may the number of banks. Unfortunately that’s all I can say at this time.


Mark and Denis, thanks for the responses.

Mark, if and when you change things like the Warp size or number of banks, etc in the GPU, are you planning to also change the device version #, e.g. 1.2.

That is, how would you recommend that developers design code that performs well with G80 and Tesla but that can scale for future Gxx processors. Would you recommend reading the description of the card installed from the properties provided by CUDA, or the Warp size and other info in the properties structure or the version of the CUDA runtime?

Use cudaGetDeviceProperties ( like in the deviceQuery example)

Device 0: “Tesla C870”
Major revision number: 1
Minor revision number: 0
Total amount of global memory: 1610350592 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1350000 kilohertz

Unfortunately that doesn’t (currently) tell you the number of banks.

I’m pretty confident that any change in the bank configuration will correspond to a change in compute capability / SM version.