R source code is recognized as REBOL code in the auto-detection.

Issue #988 new
Oldes created an issue

It's sad true, that for years, REBOL code was invalidly recognized as R code, since last update it's reversed.

Like here:
https://github.com/systematicinvestor/SIT/blob/master/R/aa.bl.r

Comments (10)

  1. Oldes reporter

    In some recent changes I included this part of code in the REBOL related lexer:

        def analyse_text(text):
            """
            Check if code contains REBOL header and so it probably not R code
            """
            if re.match(r'^\s*REBOL\s*\[', text, re.IGNORECASE):
                # The code starts with REBOL header
                return 1.0
            elif re.search(r'\s*REBOL\s*[', text, re.IGNORECASE):
                # The code contains REBOL header but also some text before it
                return 0.5
    

    It looks that there must be something else to do it right.
    @Tim Hatch don't you know what it could be? Maybe to add analyse_text for R as well?

  2. Oldes reporter

    When I removed the above analyse_text from REBOL, the R is still not recognized properly, so it must be something else. I bet it was working, when I was doing it (which is some time ago).

  3. Oldes reporter

    I was playing with it a little bit, and the problem seems to be, that SLexer (the one which is used for R language) has defined extension in uppercase here:

    https://bitbucket.org/birkenfeld/pygments-main/src/121c75491e0d3caa6bd76ff3d3e46ee62edc6c93/pygments/lexers/_mapping.py?at=default#cl-300

    When I add lowercase variant there, than SLexer is recognized in 'get_lexer_for_filename and the analyse_text is used properly to detect the correct variant.

    As it's not related to REBOL's lexer directly, I will let it be on someone else, how to fix this (maybe don't care about upper/lower case in file extensions?)

  4. Tim Hatch

    It might be hash randomization. Are you seeing the problem locally with pygmentize or something else? Or only on bitbucket (which actually uses another system to pick languages).

  5. Oldes reporter

    When I run locally this:

    python pygmentize -f html x:\test.r
    

    where test.r is R test file, REBOL lexer is used. The pull request above fixes that.

  6. Tim Hatch

    @Georg Brandl executive summary:

    • RebolLexer specifies *.r
    • SLexer (for R) specifies *.R

    Apparently On Windows (and that SIT repo) people use lowercase .r for R code. (I don't know for sure whether Windows case insensitivity is involved here, but let's assume it's not, and people are actually using lowercase for the extension.) Fundamentally, do we want to make all filenames patterns case insensitive (see PR998) or suggest in docs that people use lowercase in addition to mixed case (and update a few lexers to do this [as MakefileLexer does, because people on Windows use lowercase there too {but not for the more esoteric patterns, just the main one}]).

    I am not aware of why uppercase R was used when SLexer was added originally (if it really goes back to 0.10, that's a long time ago).

  7. Log in to comment