Lexer

The lexer makes use of a tokenizer, which in this case is the practical representation of a deterministic finite-state automaton defined in conf/lexer_rules.json.

Each state of the automaton is called a node and has a set of paths .

A path or pattern is a set of transitions leading from the origin of to one same node.

Each path is represented by a regular expression that can only be applied to a string of length=1: x belongs to X

Know your automatae

The tokenizer is the combination of all the following automatons, using q0 as the start point.

Notes * Each time a transition takes place, the next character is used as * Transitions only evaluate one character, so ^ and $ have been omitted for clarity