Wiki
Clone wikigauzaez / Lexer
Lexer
The lexer makes use of a tokenizer, which in this case is the practical
representation of a deterministic finite-state automaton defined in
conf/lexer_rules.json
.
Each state of the automaton is called a node and has a set of paths .
A path or pattern is a set of transitions leading from the origin of to one same node.
Each path is represented by a regular expression that can only be applied to
a string of length=1
:
Know your automatae
The tokenizer is the combination of all the following automatons, using q0
as the start point.
Notes * Each time a transition takes place, the next character is used as * Transitions only evaluate one character, so
^
and$
have been omitted for clarity
-
Access:
-
regular expression:
^\.$
-
automaton:
-
Assignation:
-
regular expression:
^=$
-
automaton:
-
Binary Operator:
-
regular expression:
^(\||&|<<|>>|~|\^)$
-
automaton:
-
Block:
-
Open
- regular expression:
^{$
- automaton:
-
Close
- regular expression:
^}$
- automaton:
-
Brace:
-
Open
- regular expression:
^\($
- automaton:
-
Close
- regular expression:
^\)$
- automaton:
-
Comparator:
-
regular expression:
^([<>][=]?|[!=]=)$
-
automaton:
-
End of Statement
-
regular expression:
^[\n;]$
-
automaton
-
Identifier:
-
regular expression:
^[_]*[a-zA-Z][a-zA-Z0-9_]*$
-
automaton:
-
Index:
-
Open:
- regular expression:
^\[$
- automaton:
-
Close:
- regular expression:
^\]$
- automaton:
-
Hexadecimal:
-
regular expression:
^0x[a-fA-F0-9]+$
-
automaton:
-
Negation:
-
regular expression:
^!$
-
automaton:
-
Number:
-
regular expression:
^[0-9]+(\.[0-9]+)?$
-
automaton:
-
Operator:
-
regular expression:
^(\+|\-|(\*[\*]?)|\/|%)$
-
automaton:
-
Separator:
-
regular expression:
^,$
-
automaton:
-
String:
-
regular expression:
^(")(?:(?=(\\?))\2.)*?"$
* automaton: -
Whitespace:
-
regular expression:
^ $
-
automaton:
Updated