Incorrect RegExp detection in JavaScript

Issue #403 resolved
Anonymous created an issue

In !JavaScript, a single forward slash following one of {{{ [({=,:;!%^&*|?~+- }}} starts a regular expression. Spaces are not required. For example, this is valid syntax: {{{ if (!/regexp/.test(foo)) }}} but this is not recognized by Pygments.

See more examples at http://pygments.org/demo/1605/

Reported by Pumbaa80

Comments (12)

  1. Anonymous

    I had a look at [http://dev.pocoo.org/projects/pygments/browser/pygments/lexers/web.py web.py] and I'm afraid there are [http://pygments.org/demo/1621/ some more issues] with !JavascriptLexer.

    First, according to [http://interglacial.com/javascript_spec/a-7.html#a-7.7 the JavaScript spec], this should be the right way to match operators and punctuators:

    (r'=|++|--|~|&&|\\?|:|\\|\\||\\\\|(<<|>>>?|==?|!=?|[<>+-*%&\\|\\^/])=?', Operator)
    (r'[{}()\\[\\].;,]', Punctuation)
    

    Second, character classes need a special treatment... (use several states?)

    Once this is fixed, the detection of a regexp literal's end (cf. lines 47 to 49 in web.py) can be simplified/fixed. This should sufficiently replace the three lines:

    ...   /([gim]+\\b|\\B)', String.Regex)
    

    !ActionScriptLexer and !ActionScript3Lexer might need a fix, too (?).

  2. Martin Bodin

    Hi,

    I’m opening this bug again (although the versions of Pygments changed a lot in the meantime, so maybe I should have opened a new bug). There is a mistaking coloring for the following code: http://pygments.org/demo/2016749/ It is indeed parsed as if it was the following one: http://pygments.org/demo/2016751/

    In Javascript, semicolons are only added if the absence of the semicolon leads to a syntax error. The correct parsing for http://pygments.org/demo/2016749/ should be {{{ n = < < 1 / 1 > * "<string>" > // <comment> }}} instead of {{{ < n = 1 > < /<regexp>/.test (n + '<string>') }}} as it is right now.

    I’ve had this problem by switching from Pygments 1.6 to 2.0.2: did something important changed in the meantime?

    Sincerely, Martin.

    Edit: I’ve posted this issue in the meantime, as it is probably just another bug: https://bitbucket.org/birkenfeld/pygments-main/issues/1122/semicolons-in-javascript

  3. Tim Hatch

    I'm going to leave this as resolved; both of those demos highlight the regex the same, so I think it's fixed. I will follow up on the other issue as well.

  4. Log in to comment