JsLex is slooow on big javascript files

Anonymous avatarAnonymous created an issue

Your JsLexer was too slow to use for real projects, because of the text = text[eaten:]!

We run into problems when using your lexer to tokenize Jquery UI minimized. Considering that the lib is 210kc, I decoded it to Unicode, and the lib contains 154806 tokens... then, well, the text = text[eaten:] needs to copy avg 210k/2 * 2-4 bytes of data during each loop, this amounts to whopping 30-60 GIGABYTES of stuff moved needlessly in memory! :D

Now I am a bit lazy and do not want to register into Hg, but here's a better loop for you, which makes this afaik the fastest javascript lexer around, my function being about 22 times faster on 200kc minified javascript. (230 ms vs 5 seconds)

    def lex(self, text, start=0):
        """Lexically analyze `text`.

        Yields pairs (`name`, `tokentext`).

        max = len(text)
        state = self.state
        regexes = self.regexes
        toks = self.toks

        while start < max:
            for match in regexes[state].finditer(text, start):
                name = match.lastgroup
                tok = toks[name]
                toktext = match.group(name)
                start += len(toktext)
                yield (tok.name, toktext)

                if tok.next:
                    state = tok.next

        self.state = state

Antti Haapala antti@industrialwebandmagic.com

Comments (2)

  1. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.