Bom Conversion Issues
Original [issue 35](https://code.google.com/p/okapi/issues/detail?id=35) created by @fliden on 2009-03-24T00:16:02.000Z:
- When removing the bom from UTF-32 big endian "00 00 FE FF" is removed but when removing the bom from UTF-32 little endian only the "FF FE" part of "FF FE 00 00" is removed. I'm guessing the remaining "00 00" should be removed as well.
- Fyi, auto detection of UTF-32 does not work when adding a bom.
- Auto detection of UTF-16 works when the first character occupies only 8 of the 16 bits. "Hello" is auto detected as utf-16 but not "앙영".
- When adding bom and UTF-16 is auto detected FE FF gets added to LE files and FF FE gets added to BE files. It should be the opposite.
Comments (2)
-
Account Deleted -
Account Deleted - changed status to resolved
Comment [2.](https://code.google.com/p/okapi/issues/detail?id=35#c2) originally posted by @ysavourel on 2009-03-24T02:56:36.000Z:
- All references to UTF-32 have been removed. UTF-32 is not supported.
- Because auto-detection is too unreliable, the user must specify the encoding of the file for the Add BOM feature.
When adding bom and UTF-16 is auto detected FE FF gets added to LE files and FF FE gets added to BE files. It should be the opposite.
This bug has been fixed.
- Log in to comment
Comment [1.](https://code.google.com/p/okapi/issues/detail?id=35#c1) originally posted by @fliden on 2009-03-24T00:24:25.000Z:
Btw, is it possible to have some warning if the encoding cannot be auto detected? Or should we fall back on the a user specified output encoding? When adding the utf-8 bom to a utf-16 files the content looks corrupted regardless which encoding you chose to view it as and the bom would need to be removed to "recover" the file.