C++ raw string literal

Issue #1103 closed
Corentin Schreiber created an issue

The current C++ lexer does not recognize properly the raw string literals that were introduced in C++11. Briefly, a string that starts with R"( and ends with )" is a raw string literal: within these boundaries, the traditional escaping mechanisms do not apply, since the content is not interpreted (the string is used in its "raw" form). They can also span multiple lines.

For example,

R"(hello world\n)"

is equivalent to

"hello world\\n".

I managed to write a patch that attempts to fix this issue. By adding:

    (r'R"\(', String, 'rawstring'),

to the "root" of the C++ lexer, and then

    'rawstring': [
        (r'\)"', String, '#pop'),
        (r'(?:[^)"]|\)(?!")|(?<!\))")+', String), # all other characters
    ],

I could get all my use cases to work. I don't know how that interacts with a larger code base, though.

However, this is only part of the issue. Indeed, the C++11 specification is a bit more deep than what I wrote above. In particular, one can define "custom" delimiters for the raw string literals, in the event that the default R"( ... )" is too limiting (indeed, the hidden limitation is that, with this syntax, a raw string literal cannot contain the characters )" side by side). The rule is that one can write any combination of letters between the opening quote and the opening parenthesis to create a custom delimiter. In this case, the closing delimiter must be a closing parenthesis followed by the same letters and a closing quote. E.g., R"FOO( ... )FOO". The above patch does not cover this feature.

Comments (3)

  1. Log in to comment