Bom Conversion Issues

Issue #35 resolved

Former user created an issue 2009-03-24

Original [issue 35](https://code.google.com/p/okapi/issues/detail?id=35) created by @fliden on 2009-03-24T00:16:02.000Z:

- When removing the bom from UTF-32 big endian "00 00 FE FF" is removed but when removing the bom from UTF-32 little endian only the "FF FE" part of "FF FE 00 00" is removed. I'm guessing the remaining "00 00" should be removed as well.

- Fyi, auto detection of UTF-32 does not work when adding a bom.

- Auto detection of UTF-16 works when the first character occupies only 8 of the 16 bits. "Hello" is auto detected as utf-16 but not "앙영".

- When adding bom and UTF-16 is auto detected FE FF gets added to LE files and FF FE gets added to BE files. It should be the opposite.

Comments (2)

Former user Account Deleted
Comment [1.](https://code.google.com/p/okapi/issues/detail?id=35#c1) originally posted by @fliden on 2009-03-24T00:24:25.000Z:

Btw, is it possible to have some warning if the encoding cannot be auto detected? Or should we fall back on the a user specified output encoding? When adding the utf-8 bom to a utf-16 files the content looks corrupted regardless which encoding you chose to view it as and the bom would need to be removed to "recover" the file.
- 2009-03-24T00:24:25+00:00
Former user Account Deleted
- changed status to resolved
Comment [2.](https://code.google.com/p/okapi/issues/detail?id=35#c2) originally posted by @ysavourel on 2009-03-24T02:56:36.000Z:

- All references to UTF-32 have been removed. UTF-32 is not supported.

- Because auto-detection is too unreliable, the user must specify the encoding of the file for the Add BOM feature.

When adding bom and UTF-16 is auto detected FE FF gets added to LE files and FF FE gets added to BE files. It should be the opposite.

This bug has been fixed.
- 2009-03-24T02:56:36+00:00
Log in to comment

Assignee: ysavourel

Type: bug

Priority: minor

Status: resolved

Milestone: –

Version: –

Votes: 0

Watchers: 0