How to switch between shared and local memory

Hi,

Basically my program needs an array of 18 values to work,

when I use 128 threads it’s fine : I can put all the values in shared memory,

but when I use 192 threads, I can put only 15 values in shared memory, and I need to use local memory for the others,etc…

I try to find how this could be transparent for my program : I mean how could I use v[i] with 0 <= i < 18 whatever the number of threads used.

At the moment I tried that :

#define v(_IDX)	((_IDX < limit)?	shared[_IDX] : local[_IDX - limit])

so basically it does exactly what I want, but the problem is it slows down my program a lot…

As the shared memory is quite limited I’m sure I’m not the only one having this problem, so it would be a good idea to compare the methods used.

Thanks for your help

In what way have you configured your device to execute the kernel?