Your JsLexer was too slow to use for real projects, because of the text = text[eaten:]!
We run into problems when using your lexer to tokenize Jquery UI minimized. Considering that the lib is 210kc, I decoded it to Unicode, and the lib contains 154806 tokens... then, well, the text = text[eaten:] needs to copy avg 210k/2 * 2-4 bytes of data during each loop, this amounts to whopping 30-60 GIGABYTES of stuff moved needlessly in memory! :D
def lex(self, text, start=0): """Lexically analyze `text`. Yields pairs (`name`, `tokentext`). """ max = len(text) state = self.state regexes = self.regexes toks = self.toks while start < max: for match in regexes[state].finditer(text, start): name = match.lastgroup tok = toks[name] toktext = match.group(name) start += len(toktext) yield (tok.name, toktext) if tok.next: state = tok.next break self.state = state
Antti Haapala firstname.lastname@example.org