Issue #1 resolved

JsLex is slooow on big javascript files

Anonymous created an issue

Your JsLexer was too slow to use for real projects, because of the text = text[eaten:]!

We run into problems when using your lexer to tokenize Jquery UI minimized. Considering that the lib is 210kc, I decoded it to Unicode, and the lib contains 154806 tokens... then, well, the text = text[eaten:] needs to copy avg 210k/2 * 2-4 bytes of data during each loop, this amounts to whopping 30-60 GIGABYTES of stuff moved needlessly in memory! :D

Now I am a bit lazy and do not want to register into Hg, but here's a better loop for you, which makes this afaik the fastest javascript lexer around, my function being about 22 times faster on 200kc minified javascript. (230 ms vs 5 seconds)



def lex(self, text, start=0):
    """Lexically analyze `text`.

    Yields pairs (`name`, `tokentext`).

    max = len(text)
    state = self.state
    regexes = self.regexes
    toks = self.toks

    while start < max:
        for match in regexes[state].finditer(text, start):
            name = match.lastgroup
            tok = toks[name]
            toktext =
            start += len(toktext)
            yield (, toktext)

                state =

    self.state = state


Antti Haapala

Comments (2)

  1. Log in to comment