MisterAnderson42 has kindly provided the following code snippet for stack in shared memory.
global void kernel(int max_stack_depth, …)
extern shared float stack_sdata;
int stack_depth = 0;
stack_sdata[threadIdx.x * max_stack_depth + stack_depth] = value;
top = stack_sdata[threadIdx.x * max_stack_depth + stack_depth - 1];
Say, together with stack of floats I need a stack of numbers with maximum value <= 255, so one byte for such stack item is enough. I can declare another stack of bytes like this: extern shared unsigned char stack_sdata_char; however, the question arises: will this slow things down or not.
I mean - is it harmless to use one byte numbers instead of 4 or 8 byte numbers in shared memory ?