I have a small array: “int data;”.
Without this array, I use 0 lmem. But with this array I get 48 lmem.
Now the problem is, that the program is very complex. And it took me a lot of time to get very limited “divergent branches”(0-8) and so on.
So when I try to get around lmem, my speed falls by around 25-40%.
Using shared memory, is also a pain(128 threads=> data…data).
Therefore, is there any way to put arrays into registers?