Writing a unicode string to file is incorrect

Issue #235 invalid
Newbuyer Newbuyer created an issue

There is a problem writing a Unicode string to a text file in DWScript using FileWrite() function.

// uses dwsFileFunctions

var s: string;

s := 'Test测试'; // Unicode(Chinese) string
Println(s);
var f := FileCreate('.\test.txt');
FileWrite(f, s);
FileClose(f);

The test.txt binary is now: $54 $65 $73 $74 $4B $D5
which is wrong, should be:
ANSI: $54 $65 $73 $74 $B2 $E2 $CA $D4
UTF8: $54 $65 $73 $74 $E6 $B5 $8B $E8 $AF $95

Workaround:

//dwsUtils.pas

procedure RawByteStringToScriptString(const s : RawByteString; var result : UnicodeString); overload;
begin
   if s = '' then
      result := ''
   else
// BytesToScriptString(Pointer(s), Length(s), result);
     result := Utf8Decode(s); // use UTF8 function instead
end;

procedure ScriptStringToRawByteString(const s : UnicodeString; var result : RawByteString); overload;
var
   n : Integer;
begin
   if s = '' then
      result := ''
   else begin
//      n := Length(s);
//      SetLength(Result, n);
//      WordsToBytes(Pointer(s), Pointer(Result), n);
      result := Utf8Encode(s); // use UTF8 function instead
   end;
end;

Comments (2)

  1. Eric Grange repo owner

    This is as designed, File functions treat strings as containers of byte data, so only the lower 8bits are used.

    If you want to write with utf-8 encoding you have to use

    FileWrite(f, UTF8Encoder.Encode(s));
    

    or one of the other encoders for other formats (like utf16)

  2. Log in to comment