I have a sort of specific task that requires a stack-like structure in memory.
My kernel has two inputs: an array of ‘instructions’ and an array of source data.
The array of instructions is a list in reverse polish notation that represents a mathematical expression, for the sake of brevity let’s state that this expression has the only variable X and a set of simple math operations (+ - * /) that should be applied to X.
For example: X * X + (X / X)
The same in reverse polish notation: X X * X X / +
The second input - an array of source data - contains X values. The goal is to parse the mathematical expression for a huge number of X values as fast as possible. Each thread should parse the same expression for different X values (the index in the source data array is calculated on the basis of thread number, expression is the same for all running threads).
Originally, tha task was to parse the mathematical expression that is represented as a binary tree, however, this requires recursion. Reverse polish notation (tree is transformed to a linear list) allows to avoid recursive calls and other unpleasant things, however, stack is still required. When parsing the sample expression (X X * X X / +) it is necessary to store two X values somewhere before multiplication and the store the result of multiplication as well.
I would appreciate any suggestions very much … still can hardly imagine how to use shared memory (or some other approach ?) for this purpose.
Thanks in advance.