The bank mapping for shared memory on I believe compute 3.0 and up is the

(address of the look up / 4) % 32

The divide by four is because shared banks are aligned to 32 bits but addresses are 8 bit. Mod 32 is because there are 32 banks.

Now in a 1D array of ints each element has an address divisible by 4 (since the values are aligned), also each element is at an address 4 higher than the previous (because the array is contiguous in memory) so a 1D array with 32 values might have addresses like.

```
112 116 120 124 128 132 136 ... 228 232 236
```

Then if you look at the banks of those address you get

```
28 29 30 31 0 1 2 ... 25 26 27
```

Now if we add an extra row to make a 2d array the first element of row 1 is right after the last of row 0 since all the elements are contiguous in memeory.

```
112 116 120 124 128 132 136 ... 228 232 236
240 244 248 254 258 256 260 ... 356 360 364
```

And the banks of all the columns match.

```
28 29 30 31 0 1 2 ... 25 26 27
28 29 30 31 0 1 2 ... 25 26 27
```

Note that if you had any multiple of 32 columns all the banks would match in every row. Also if you had a number of columns that evenly divides 32 then each row would follow a pattern (with 16 columns each row would alternate between 2 banks).

Now if we add one more element to columns raising it to 33.

```
112 116 120 124 128 132 136 ... 228 232 236 240
248 254 258 256 260 264 268 ... 360 364 368 372
```

Then the banks look like

```
28 29 30 31 0 1 2 ... 25 26 27 28
29 30 31 0 1 2 3 ... 26 27 28 29
```

So each row has a bank one higher than the previous row in the same column. If you had 33 rows rows 0 and 32 would have the same banks.