- changed title to Encode/decode JS entities works on one byte at a time and is not reversible
-
assigned issue to
- marked as enhancement
The decoding algorithm can only handle single-byte sequences. So, this works:
\u00E4 \u00F6 \u00FC \u00DF (decode =>) ä ö ü ß
But this is broken:
ä ö ü ß (encode =>) \u00C3\u00A4 \u00C3\u00B6 \u00C3\u00BC \u00C3\u0178
A file in UTF-8 gives 2 bytes to each character, and the algorithm encodes each one separately.
That's a limitation of the original author's design (based on pre-Unicode Notepad++). It affects both 32- and 64- bit versions.
As I said in 82f9b0e,
More work still needed before utf8mb4 can be encoded *correctly*
Fixing this will be part of that overall task.
Convert selected text to UTF-8 before encoding
Fixes
#1→ <<cset d2189a1d3f85>>