JsLex is slooow on big javascript files

Create issue
Issue #1 resolved
Former user created an issue

Your JsLexer was too slow to use for real projects, because of the text = text[eaten:]!

We run into problems when using your lexer to tokenize Jquery UI minimized. Considering that the lib is 210kc, I decoded it to Unicode, and the lib contains 154806 tokens... then, well, the text = text[eaten:] needs to copy avg 210k/2 * 2-4 bytes of data during each loop, this amounts to whopping 30-60 GIGABYTES of stuff moved needlessly in memory! :D

Now I am a bit lazy and do not want to register into Hg, but here's a better loop for you, which makes this afaik the fastest javascript lexer around, my function being about 22 times faster on 200kc minified javascript. (230 ms vs 5 seconds)

{{{ #!python def lex(self, text, start=0): """Lexically analyze text.

    Yields pairs (`name`, `tokentext`).

    max = len(text)
    state = self.state
    regexes = self.regexes
    toks = self.toks

    while start < max:
        for match in regexes[state].finditer(text, start):
            name = match.lastgroup
            tok = toks[name]
            toktext = match.group(name)
            start += len(toktext)
            yield (tok.name, toktext)

            if tok.next:
                state = tok.next

    self.state = state


Antti Haapala antti@industrialwebandmagic.com

Comments (2)

  1. Log in to comment