Matiec fails to parse russian strings

Create issue
Issue #60 new
Павел Бельтюков created an issue

This code


gives an error:

Parsing failed because of too many consecutive syntax errors. Bailing out!

If the code is changed to


matiec works fine...

So there are some unicode related issues...

UPDATE (Feb 18th 2017):

OK, I've rebuild matiec with YYDEBUG, added empty ieclib.txt to temp dir, and added this

    tst_string : STRING := 'Привет';

In log file there is a fragment:

Next token is token ASSIGN (: )
Shifting token ASSIGN (: )
Entering state 612
Reading a token: Next token is token $undefined (: )
Shifting token error (: )
Entering state 1117
Reducing stack by rule 286 (line 2685):
   $1 = nterm elementary_type_name (: )
   $2 = token ASSIGN (: )
   $3 = token error (: )
/home/anon/YAPLC/ error: invalid initial value in specification with initialization.

In iec_flex.II I see:

common_character_representation     [\x20\x21\x23\x25\x26\x28-\x7E]|{esc_char}
double_byte_character_representation    $\"|'|{double_byte_char}|{common_character_representation}
single_byte_character_representation    $'|\"|{single_byte_char}|{common_character_representation}

I think that's the reason why unicode strings can't be parsed by matiec. Symbols \x80-\xff are not recognized by tokenizer.

How about adding UTF-8 string support to matiec?

What will be the impact on matiec if one Simply replaces

common_character_representation     [\x20\x21\x23\x25\x26\x28-\x7E]|{esc_char}


utf_8_start_char    [\xC0-\xDF]|[\xE0-\xEF]|[\xF0-\xF7]
utf_8_end_char     [\x80-\xBF]
utf_8_char             {utf_8_start_char}|{utf_8_end_char}
common_character_representation     [\x20\x21\x23\x25\x26\x28-\x7E]|{utf_8_char}|{esc_char}


If it's possible to make matiec accept unicode strings in such fassion, then we can add some hook on stage 3 to check unicode strings, as utf-8 has self synchronization we can even drop incorrect values and give a warning to user...

Actuallly we don't need to check string correctness, as C-compiler or even PLC runtime can be responsible for this.

Comments (1)

  1. Log in to comment