Code highlighting for Objective-C .m files doesn't work (sometimes)

Issue #1532 wontfix
created an issue

Comments (13)

  1. dmitry reporter

    Maybe it now tries to use MatLab lexer for *.m files, since it uses the same file extension? Probably, MatLab is not so popular as Objective-C on Bitbucket ;)

  2. Dylan Etkin

    I have looked into this and Pygments is trying to guess which lexer to use by file extension and by looking at the first few bytes of the file.

    I believe that in some cases it is guessing incorrectly.

    I think we need to open a bug with Pygments providing your two files as a test case.

  3. Dylan Etkin

    The language chosen for a repository is meta information, we don't use that to force Pygments to use a certain lexer. Most projects will have a mixture of some language code, text files, html, css, js, etc... If we forced pygments to use the primary language of the repo it would more often than not get the highlighting wrong.

    The best practice for getting pygments to highlight source is to let it try to guess the lexer based on file extension and if it can not figure it out via that then you pass it the first few k of the file to let it try to guess the lexer based on the content.

    I really believe this needs to be addressed as a pygments bug.



  4. Louis Feng

    If you look at the actual code for ObjectiveCLexer

    def analyse_text(text):
            if '@"' in text: # strings
                return True
            if re.match(r'\[[a-zA-Z0-9.]:', text): # message
                return True
            return False

    It's simply inadequate to analyse Objective-C files correctly. If anything it should try to detect keywords like @Osman Ungur, @Interface, and @Implementation. Another issue is in the MatlabLexer, where by default analyse_text returns 0.1

    def analyse_text(text):
            if re.match('^\s*%', text, re.M): # comment
                return 0.9
            elif re.match('^!\w+', text, re.M): # system cmd
                return 0.9
            return 0.1

    So when these two competes, MablabLexer simply wins by default. These are issues easy to fix but I'm not sure how active the Pygments developers are as many of the pull requests are weeks old.

    I think an option to allow repository owners to specify the file language explicitly is also a good workaround.

  5. Log in to comment