I’m currently in the process of trying to write a processor software stack (uArch simulator, compiler, binary tools, etc) in CUDA,
and while most things are pretty straight forward, the compiler frontend is turning out to be an area where I’m finding a lack
of related work and existing implementations, specifically the Lexer/Parser/AST.
So I’m hoping that some people here will find this interesting enough to walk through a potential design with me.
I’m much more interested in an implementation that is simple, easy to understand, and scalable on future processors than one
that trades implementation complexity for performance. Let’s start with one problem at a time (Lexical Analysis), and begin with
Consume an arbitrary long sequence of characters, produce the corresponding sequence of tokens (possibly with attached metadata) defined by a set of rules (regular expressions).
Get good occupancy for large inputs on a 2015-2018 era GPU (extrapolate current architectures out a few die shrinks).
Have linear or near-linear complexity for simple languages (don’t try to beat a sequential implementation based on FAs by brute forcing it).
Settle on a simple high-level algorithm. My current favorite is bottom-up DFA merging.
Handling payload data, e.g. “some_variable_name” -> TOKEN_IDENTIFIER (then token can be encoded as an int, how do you store the value “some_variable_name” semi-efficiently? How does this generalize to complex payload data?)
How to automatically synthesize a CUDA engine from a language specification?
Algorithm Overview (bottom-up DFA merging):
Sweep over a window of the input stream, feed each character as the input to an uninitialized state machine.
Recursively merge neighboring state machines together, possibly generating tokens and a single state machine in a new state.
Gather the generated tokens together into a sequence.
Thanks a lot for the links. I’m still reading through “Survey of parallel context-free parsing techniques” and the related
references. There are definitely some similarities with what I am proposing, possibly with what the paper calls connectionist
Some of the other links had some interesting observations, for example, that starting sequential parsers from random
locations, would typically converge quickly. However, I’m generally not in favor of speculative methods because I think
that they would fall apart when parsing worst-case grammars. For example, when a dependence in the DFA spans multiple
partitions. In such cases, I’m not sure if the result would even be correct.
Also, I don’t think that you can easily evolve a partitioning based algorithm into a conservative algorithm because,
unlike other partitioning based parallel algorithms like intersect or merge, the input cannot be easily sorted. So
you may have to linearly scan backwards after an initial partitioning stage to find the beginning of the first token in
the partition. I can imagine cases where the partition occurs in a very large comment, string, array, etc, and the
runtime is dominated by the linear scan.
I’m much more in favor of conservative algorithms where performance possibly degrades as the grammar becomes more complicated,
but the partitioning should be done with a divide and conquer algorithm to place a bound on the worst-case runtime.