I noticed that arrays which are local to a thread will be put in local memory instead of registers. This is kind of expected, as registers are rarely indexable. And the solution is obvious: just unroll the loops.
The problem is that each iteration of my loop is quite complex, and for a fluid simulation in 2D I need 9 iterations. If I want to extend it to 3D, I will need 19 iterations. To keep my code maintainable, I would leave have to leave this optimization until the last stage, when I’m certain that everything else works perfectly. However, the best solution would be if the compiler could do the job for me.
So I’m wondering if there are any plans to add automatic loop unrolling to the compiler anytime soon, so that local arrays can be placed in registers?