write / read global memory OR bitwise operation (shifts) during runtime (more than once… in loop). Operation will be done on 64bit numbers (or even “simulated” 128bit, by struct).
So far I am using global memory, but efficiency is realy poor. I have stack stored in global memory, but stack is only from 4bit numbers (well… in stack stored as char), so I thought, store stack as 64bit number and use logical operation instead of array access to global memory. If I use 64bit number as stack for 4 bit numbers, i got capacity 16, which is enough for me for most cases.