To avoid bank conflict

I have 128 points value with 16 channels (all data in fp16) stored in smem. For mma operation, I need to load 4 different channels value of a point per thread per time. Each time, a warp will load 16 channels value of 8 different points, but the points’s indexs are all random. Is there any way to avoid bank conflict here?

The layout of points values in smem is not fixed. I can rearrange the points to meet the method that can avoid bank conflict.

I have about 8 idle registers, so if use more registers can less the cost of load data from smem, it will be fine

“I haven’t told you the actual load pattern, can you guarantee to remove bank conflicts on the load?”

No, it can’t be guaranteed.

“What if I rearrange the data?”

Rearranging does not help if you don’t know the load pattern.

“What if…”

there aren’t any other conditions that will guarantee to remove bank conflicts from a shared load if the pattern is unknown/unspecified.

You can break a single shared load into multiple loads, and thereby eliminate the possibility of shared bank conflicts. But this is effectively creating the serialization that a bank-conflicted load creates. Its not a win.

To fix this with data rearrangement, layout the points in shared so that each thread in the warp is loading from a separate bank. If you’re not familiar with what that direction means, I recommend unit 4 of this online training series.

1 Like