1. Georg Brandl
  2. pygments-main
  3. Issues
Issue #474 new

Ruby: Non-ASCII Method Names Not Recognised

Anonymous created an issue

Ruby 1.9 allows method names to include non-ASCII characters with the following caveats:

  • The characters must be valid in the file's source encoding.

  • A legal method name that does not end with '!', '?', or '=' may have one of these characters appended.

  • The ASCII punctuation characters of which operator methods consist (e.g. {{{[*%&^`~+-/\[<>=]}}}) must not appear in any other permutation, with the exception of the above case.

Pygments does not recognise such method names, lexing the first non-ASCII character as an error. Examples of unrecognised method names are given in http://pygments.org/demo/3147/ .

Reported by guest

Comments (4)

  1. thatch

    Do you have any reference to those rules, or perhaps the grammar itself? I checked the existing RubyLexer's rules and they're super-complicated:

                 bygroups(Name.Class, Operator, Name.Function), '#pop'),
  2. thatch

    I did some digging. I still can't find a formal announcement, but local rubyers confirm that such support was "rumored."

    Checking the source (ruby 1.9 snapshot, `parse.y`) I see some code for this.

    #define is_identchar(p,e,enc) (rb_enc_isalnum(*p,enc) || (*p) == '_' || !ISASCII(*p))
    #define parser_is_identchar() (!parser->eofp && is_identchar((lex_p-1),lex_pend,parser->enc))
        mb = ENC_CODERANGE_7BIT;
        do {
            if (!ISASCII(c)) mb = ENC_CODERANGE_UNKNOWN;
            if (tokadd_mbchar(c) == -1) return 0;
            c = nextc();
        } while (parser_is_identchar());
        switch (tok()[0]) {
          case '@': case '$':
            if ((c == '!' || c == '?') && !peek('=')) {
            else {
  3. Tim Hatch
    • changed milestone to Sprint
    • removed assignee
    • edited description

    This is still an issue; at tip we wee:

    def func(x) end # pass, but for the wrong reasons
    def sn(x) end  # fail
  4. Log in to comment