Shared memory layout Question about how arrays are organised

Hello,

just a question to be sure I’ve correctly understand the manual.

(the “each consecutive 32bit word lies in a neighbouring bank” stuff)

If I declare a “shared” array of 128 "unisgned char"s, after each consecutive 4 char (4 x 8bits = 1 32bit word), the 5th one will be in the next bank.

And after the 16th bank, the next car will be in the next address (the next 32bit word, thus the 5th byte into that bank).

It’s layout into shared memory will be :

__shared__ usigned char array[0x80];

Address|Bank0|Bank1|Bank2|Bank3|Bank4|Bank5|Bank6|Bank7|Bank8|Bank9|BankA|BankB|BankC|BankD|BankE|BankF|

-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

0000000| [0] | [4] | [8] | [C] | [10]| [14]| [18]| [1C]| [20]| [24]| [28]| [2C]| [30]| [34]| [38]| [3C]|

0000001| [1] | [5] | [9] | [D] | [11]| [15]| [19]| [1D]| [21]| [25]| [29]| [2D]| [31]| [35]| [39]| [3D]|

0000002| [2] | [6] | [A] | [E] | [12]| [16]| [1A]| [1E]| [22]| [26]| [2A]| [2E]| [32]| [36]| [3A]| [3E]|

0000003| [3] | [7] | [B] | [F] | [13]| [17]| [1B]| [1F]| [23]| [27]| [2B]| [2F]| [33]| [37]| [3B]| [3F]|

-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

0000004| [40]| [44]| [48]| [4C]| [50]| [54]| [58]| [5C]| [60]| [64]| [68]| [6C]| [70]| [74]| [78]| [7C]|

0000005| [41]| [45]| [49]| [4D]| [51]| [55]| [59]| [5D]| [61]| [65]| [69]| [6D]| [71]| [75]| [79]| [7D]|

0000006| [42]| [46]| [4A]| [4E]| [52]| [56]| [5A]| [5E]| [62]| [66]| [6A]| [6E]| [72]| [76]| [7A]| [7E]|

0000007| [43]| [47]| [4B]| [4F]| [53]| [57]| [5B]| [5F]| [63]| [67]| [6B]| [6F]| [73]| [77]| [7B]| [7F]|

-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

(All address, bank name and array offset in hex, of course. Address in bytes (8bits) not in 32bit words).

And if I declare an array of a 4 vector with each member being a 32 bit element, each vector will span 4 banks, and we move to the next 32bit word after 4 vecotrs).

With the following layout :

__shared__ float4 array[8];

Address|Bank0|Bank1|Bank2|Bank3|Bank4|Bank5|Bank6|Bank7|Bank8|Bank9|BankA|BankB|BankC|BankD|BankE|BankF|

-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

0000000| ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   |

0000001|[0].x|[0].y|[0].z|[0].w|[1].x|[1].y|[1].z|[1].w|[2].x|[2].y|[2].z|[2].w|[3].x|[3].y|[3].z|[3].w| 

0000002| |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   |

0000003| v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   |

-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

0000004| ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   | ^   |

0000005|[4].x|[4].y|[4].z|[4].w|[5].x|[5].y|[5].z|[5].w|[6].x|[6].y|[6].z|[6].w|[7].x|[7].y|[7].z|[7].w|

0000006| |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   | |   |

0000007| v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   | v   |

-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

Am I right ? Did I understand the manual correctly ?

Or does the mentioned layout only apply to array of 32bits word (ints, floats) and the layout is slightly different for differently sized array element ?

(for exemple : successive array elements are always in neighbouring banks. Except that instead of changing banks each 32bits word, we change banks each 8bits or each 128bits depending on the size of the array elements ?)

Thank you for your help.

Successive 32bit words are in successive banks. So, as you pointed out, 4 consecutive chars are in the same bank (assuming they’re aligned on a 32bit boundary).

Paulius