Basically my program needs an array of 18 values to work,
when I use 128 threads it’s fine : I can put all the values in shared memory,
but when I use 192 threads, I can put only 15 values in shared memory, and I need to use local memory for the others,etc…
I try to find how this could be transparent for my program : I mean how could I use v[i] with 0 <= i < 18 whatever the number of threads used.
At the moment I tried that :
#define v(_IDX) ((_IDX < limit)? shared[_IDX] : local[_IDX - limit])
so basically it does exactly what I want, but the problem is it slows down my program a lot…
As the shared memory is quite limited I’m sure I’m not the only one having this problem, so it would be a good idea to compare the methods used.
Thanks for your help