Bom Conversion Issues

Issue #35 resolved
Former user created an issue

Original [issue 35](https://code.google.com/p/okapi/issues/detail?id=35) created by @fliden on 2009-03-24T00:16:02.000Z:

- When removing the bom from UTF-32 big endian "00 00 FE FF" is removed but when removing the bom from UTF-32 little endian only the "FF FE" part of "FF FE 00 00" is removed. I'm guessing the remaining "00 00" should be removed as well.

- Fyi, auto detection of UTF-32 does not work when adding a bom.

- Auto detection of UTF-16 works when the first character occupies only 8 of the 16 bits. "Hello" is auto detected as utf-16 but not "앙영".

- When adding bom and UTF-16 is auto detected FE FF gets added to LE files and FF FE gets added to BE files. It should be the opposite.

Comments (2)

  1. Former user Account Deleted

    Comment [1.](https://code.google.com/p/okapi/issues/detail?id=35#c1) originally posted by @fliden on 2009-03-24T00:24:25.000Z:

    Btw, is it possible to have some warning if the encoding cannot be auto detected? Or should we fall back on the a user specified output encoding? When adding the utf-8 bom to a utf-16 files the content looks corrupted regardless which encoding you chose to view it as and the bom would need to be removed to "recover" the file.

  2. Former user Account Deleted

    Comment [2.](https://code.google.com/p/okapi/issues/detail?id=35#c2) originally posted by @ysavourel on 2009-03-24T02:56:36.000Z:

    - All references to UTF-32 have been removed. UTF-32 is not supported.

    - Because auto-detection is too unreliable, the user must specify the encoding of the file for the Add BOM feature.

    When adding bom and UTF-16 is auto detected FE FF gets added to LE files and FF FE gets added to BE files. It should be the opposite.

    This bug has been fixed.

  3. Log in to comment