Commits

Aidan Kehoe committed 301f424 Merge

Merge.

Comments (0)

Files changed (531)

 223736d75acb5265cfd9352497e8483d787d8eab r21-2-45
 0784d089fdc93fb58040b6efbec55cd4fdf650c2 r21-2-46
 5aa1854ad5374fa936e99e22e7b1242097292f16 r21-2-47
+dbd2a866e38aad0fae49c10f7756c1e282d4157b ben-unicode-premerge-bp
 aaf96f4ba61234a5331b280b774d357503cd7994 ben-lisp-object-bp
 1af222c7586991f690ea06d1b8c75fb5a6a0a352 r21-5-28
 5c427ece884b7023a244fba8cad8cf41b37dd5ca r21-5-29
+35c6983c17db5850ddfeac0243e5bf47e8fe31d0 ben-unicode-premerge-final-ws-year-2005
 3742ea8250b5fd339d6d797835faf8761f61d0ae ben-lisp-object-final-ws-year-2005
 d185fa593d5fcf818ca0d27e53374348d936d7e8 last-version-with-netinstall
 e7991690160358d5dd0a8adbf54e0bb8bfc56891 r21-5-30
 	Update with my changes to the trunk since the release of 21.5.29
 	in 2009 up through April 9, 2010.
 
+2010-03-29  Ben Wing  <ben@xemacs.org>
+
+	* README.unicode-internal:
+	Update bug list.
+
+2010-03-14  Ben Wing  <ben@xemacs.org>
+
+	* README.unicode-internal:
+	Update bug list.
+
+2010-03-02  Ben Wing  <ben@xemacs.org>
+
+	* README.unicode-internal:
+	Update bug list.
+
+2010-02-28  Ben Wing  <ben@xemacs.org>
+
+	* README.unicode-internal:
+	Update bug list.
+
+2010-02-20  Ben Wing  <ben@xemacs.org>
+
+	* README.unicode-internal:
+	Update bug list.
+
+2010-02-16  Ben Wing  <ben@xemacs.org>
+
+	* README.unicode-internal:
+	Add bug list to README.unicode-internal.
+	
+2010-01-25  Ben Wing  <ben@xemacs.org>
+
+	* Makefile.in.in:
+	* Makefile.in.in (.PHONY):
+	Also pass unicode-encapsulate and make-case-conv targets down
+	to src/Makefile.in.in.
+
 2010-04-02  Aidan Kehoe  <kehoea@parhasard.net>
 
 	* CHANGES-beta:
 check-features: all
 	cd ./src && $(MAKE) $(RECURSIVE_MAKE_ARGS) $@
 
+.PHONY: unicode-encapsulate
+unicode-encapsulate:
+	cd ./src && $(MAKE) $(RECURSIVE_MAKE_ARGS) $@
+
+.PHONY: make-case-conv
+make-case-conv:
+	cd ./src && $(MAKE) $(RECURSIVE_MAKE_ARGS) $@
+
 ## We have to force the building of Emacs.ad.h as well in order to get it
 ## updated correctly when VPATH is being used.  Since we use move-if-change,
 ## it will only actually change if the user modified ${etcdir}/Emacs.ad.

README.unicode-internal

+3-28-10
+
+TODO:
+
+(1) Get crash resizing jit charset to-unicode tables because they are dumped
+and you can't xrealloc() dumped data. Fix it.
+
+STATUS: DONE
+
+(2) Test char/category tables.  Copy a table, see if it is `equal'.
+Copy a table using `map-*'; loop over chars, see if all chars hae same
+value in both tables. (Might not be `equal' under old-Mule due to gaps
+between valid characters.)
+
+STATUS: DONE
+
+3-13-10
+
+1. New query method written but doesn't handle
+
+   (a) remaining specifications of `safe-chars' and `safe-charsets'
+   (b) CCL errors
+   (c) most importantly, coding systems with CRLF and CR EOL type
+
+STATUS: 1(c) DONE
+
+2. Still `make check' problems especially with CCL
+
+STATUS: DONE
+
+3. Consider implementing optimizations:
+
+   (1) In the DFC routines, if the text is entirely ASCII and the coding
+       system passes ASCII through unchanged, then don't do anything
+   (2) UTF-8, especially, and other ASCII-compatible coding systems can
+       use skip_ascii() plus memcpy() to quickly copy chunks of ASCII
+       rather than slowly processing character-by-character.
+
+4. At some point, eliminate Big5 and Shift-JIS coding systems in favor
+   of multibyte.
+
+5. Need to implement multibyte detect; implement a generalized detection
+   routine that looks for non-ASCII characters and also looks for error
+   bytes.
+
+3-2-10
+
+1. Multibyte should use private Unicode codepoints to handle gaps in charsets.
+
+   STATUS: PROBABLY NOT
+
+2. Problems with corruption of foo_coding_stream -- there needs to be a
+   description for the coding stream, stored using a "description map" that
+   points from the coding stream through the coding system, from coding system
+   to the methods, and from the methods to the stream description.
+
+   STATUS: DONE
+
+3. At some point, fix up text.h so that all the stuff in lisp.h about the
+   basic of characters and unicode can go in text.h.  Maybe just move text.h
+   earlier and remove some of the high-level garbage in text.h that requires
+   XSTRING and such into another file.
+
+   STATUS: DONE
+
+4. At some point, eliminate Big5 and Shift-JIS coding systems in favor
+   of multibyte.
+
+   STATUS: NOT DONE
+
+5. Need to implement multibyte detect; implement a generalized detection routine
+   that looks for non-ASCII characters and also looks for error bytes.
+
+   STATUS: NOT DONE
+
+6. multibyte query: add to foo_convert a `struct convert_error *' argument:
+
+   struct convert_error
+   {
+     Bytecount num_error_source;
+     Bytecount num_error_out;
+     enum how_to_convert;
+   }
+
+   how_to_convert =
+     CONVERT_STOP_UPON_ERROR
+     CONVERT_CONTINUE_UPON_ERROR
+
+   If the `struct convert_error' poiner is NULL, then how_to_convert should
+   be taken as CONVERT_CONTINUE_UPON_ERROR and no error info recorded.
+   Otherwise, num_error_source indicates the number of erroneous source
+   bytes and num_error_out the number of bytes written out to indicate an
+   error.  The number of good bytes in the source and output can be derived,
+   respectively from the return value (total bytes of source processed)
+   and size increase in output Dynarr.
+
+   STATUS: PARTIALLY DONE
+
+7. still various `make check' problems
+
+   STATUS: PARTIALLY DONE
+
+
+2-25-10
+
+PROBLEMS IN ADDITION TO 2-23-10 BUGS:
+
+1. DECODE_OUTPUT_PARTIAL_CHAR changed to look for ch = -1 instead of ch = 0.
+   This requires a corresponding change throughout the conversion methods.
+FIXED
+2. check_nonoverlapping_charsets needs to be written and the algorithm fixed up.
+FIXED
+
+2-23-10
+
+BUGS:
+
+1. etc/HELLO:
+
+Big5 doesn't show up, third char of four GB chars doesn't show up, Greek
+doesn't show up.  Some of the GB chars look grayed out.
+
+COMMENTS: The Greek problems were due to `windows-875', an EBCDIC charset
+with both Latin and Greek characters in them that ends up higher inthe
+precedence list than greek.  I fixed this for the moment by commenting out
+all the EBCDIC charsets, since there were other issues.  I also make this a
+`greek' charset not a `latin' one (after all, it's nicknamed "IBM Greek").
+If we put the EBCDIC charsets back, we have to make sure the order of Greek
+charsets is such that windows-875 comes after the others.
+
+2. `make check' -- getting better.
+
+2-16-10
+
+BUGS:
+
+1.
+Fix map-charset-chars to be compatible with GNU Emacs, so that we can
+incorporate much of characters.el.
+
+It's documented as:
+
+DEFUN ("map-charset-chars", Fmap_charset_chars, Smap_charset_chars, 2, 5, 0,
+     (function, charset, arg, from_code, to_code)
+       Lisp_Object function, charset, arg, from_code, to_code;
+
+Call FUNCTION for all characters in CHARSET.
+FUNCTION is called with an argument RANGE and the optional 3rd
+argument ARG.
+
+RANGE is a cons (FROM .  TO), where FROM and TO indicate a range of
+characters contained in CHARSET.
+
+The optional 4th and 5th arguments FROM-CODE and TO-CODE specify the
+range of code points (in CHARSET) of target characters.  */)
+
+Note also: I commented out charset chinese-cns11643-1 for category ?t.
+We need to junk what we've got, incorporate characters.el which the new
+categories, and fix whatever stuff references the old categories
+(e.g. word-across-newline) -- maybe just sync with the latest version of
+fill.el.
+
+FIXED.
+
+2. etc/HELLO:
+
+Big5 doesn't show up, third char of four GB chars doesn't show up, Greek
+doesn't show up.  Some of the GB chars look grayed out.
+
+3. `make check' fails in various ways (but less than before).
+
+
+
+2-14-10
+
+Bugs #1 and #4 below were due to bad case tables, which boiled down to a
+combination of questionable entries in uni-case-conv.el and a regexp bug
+in searching for negated ranges, which led to uni-case-conv.el getting
+byte-compiled incorrectly.
+
+FIXME: There are three translations in uni-case-conv that I had to disable.
+Those three translations overwrite the lowercase->capital mapping for s,
+k, and a with ring above it.  Probably the first two are the important
+ones.  With these three mappings (which come from the Unidata case-
+conversion info), byte-compilation messes up -- remove all the .elc files
+and recompile and it seems to work, but then during loadup in preparation
+for dumping you get a crash due to stack underflow while loading
+behavior-defs.el (the particular file may change).  This crash also
+occurs if you add uni-case-conv to the standard repo -- it's not specific
+to Unicode-internal.
+
+FIXME: Currently I just hacked those three translations by moving them to
+the top, so good translations overwrite them.  We want to do the equivalent
+in a less ad-hoc fashion by looking to see whether existing conversions
+exist, and if so making sure we put them back after adding the new conversion.
+Modify the python script that creates uni-case-conv.el to do this --
+there's already some commented out code that does something similar.
+
+1. Open up bad-c-fill-paragraph.h.
+
+Try filling the paragraph "We define ..." or "To use, ..." near the top, and
+you'll see that it's filling the paragraphs to an exact fixed number of
+chars, splitting words, sometimes removing spaces, etc.
+
+NOTE NOTE: This apparently doesn't happen when not Unicode-internal.  What
+changes?  Also, check whether the default syntax class is correct --
+apparently it was changed to `word' at some point, did we get the change
+merged?
+
+PARTIAL ANSWER: (looking-at word-across-newline) behaves differently -- it
+returns `t' at the beginning of a line when it should return `nil'.
+
+ANSWER: Category ?t contained ASCII characters because it contained all characters
+in charset chinese-cns11643-1, which contains ASCII characters in it (yuck).
+@@#### We need to incorporate the GNU Emacs categories in characters.el, and first
+add a few functions to support it.
+
+2. etc/HELLO:
+
+Big5 doesn't show up, third char of four GB chars doesn't show up, Greek
+doesn't show up.  Some of the GB chars look grayed out.
+
+3. `make check' fails in various ways.
+
+
+2-10-10
+
+BUGS noticed running 0bui:
+
+1. Open up redisplay-msw.c.
+Do:
+(query-replace-regexp "int \\(.*?\\)cursor" "Pixcount \\1cursor" nil)
+
+It seems to ignore the `curs' entirely, finding any matches whose final two
+chars are `or' and when a `cursor' is found, including an extra `curs' in the
+replaced text.
+
+2. Open up bad-c-fill-paragraph.h.
+
+Try filling the paragraph "We define ..." or "To use, ..." near the top, and
+you'll see that it's filling the paragraphs to an exact fixed number of
+chars, splitting words, sometimes removing spaces, etc.
+
+3. etc/HELLO:
+
+Big5 doesn't show up, third char of four GB chars doesn't show up, Greek
+doesn't show up.  Some of the GB chars look grayed out.
+
+4. Replacing with case-matching:
+
+Open up bad-replace.h.  Search-and-replace "ite" with "sized_ite".  Find
+anywhere that contains the string "ITE" in capital letters.  Do the search-
+and-replace and you'll find "?IZED_ITE" in place of "SIZED_ITE".
+
+
+1-21-10
+
+To do:
+
+-- Let charset tags be redefinable, e.g. you might want to redefine list
+   tags.  To do this, we need a subr that recomputes the hashing, after
+   we have removed the tag.
+-- We need to check to make sure that if an error occurs during
+   register-charset-tag, we undo any changes made in the process, i.e.
+   remove the charset from any tables it's in.
+-- add functions to query charset tags, e.g. get their properties.
+-- put some info about standard charset tags somewhere, e.g. in
+   `unicode-to-char' and/or `define-charset-tag'.
+-- Instead of chinese-gb-env/list, use chinese-gb/env.  Possibly: whenever
+   we set a `charset' property, have the set-language-* function automatically
+   create a FOO/env tag containing the contents of the `charset' property.
+
+Also consider:
+
+-- write a function to extract charsets from coding systems and use it
+   to generate a default value for `safe-charsets'.  but first see if
+   something like that is already present -- if so, modify it for
+   `multibyte' charsets.
+
+-- the todo's etc. in my notebook -- copy them into this file.
+-- multibyte -- look more carefully at its error behavior and whether it
+   "unifies" properly in old-Mule.
+
+SHORT-TERM TO DO:
+================
+
+-- Testing, testing, testing.  Need to test under both Unicode-internal and
+   old-Mule.  Under old-Mule, we need to test out the handling of the new
+   Unicode and non-encodable-charset features.  Under Unicode-internal, we
+   need to test out the reading and writing of iso2022-encoded files, the
+   handling of characters encoded in private-Unicode space, the display of
+   stuff out of the symbol font (e.g. for the continuation-glyph) (which
+   doesn't currently work!).
+
+-- For all charsets moved into mule-charset.el, follow the lead of
+   Ethiopic and VISCII in putting notes in mule-charset.el indicating
+   the file it was moved from and why (usually because the file in
+   question contains ISO-2022-embedded text using the same charsets
+   being defined in the file, so you have a bootstrapping problem)l
+   Also in the file it was moved from, put the first line of the
+   call to `make-charset' behind a comment, and put a comment above
+   telling where it was moved to.  Again, use ethiopic.el and
+   vietnamese.el. (MOSTLY DONE)
+
+-- For charsets not needing to be in mule-charset.el, move them
+   back into where they came from. (MOSTLY DONE)
+
+-- The order for the charsets in `make-charset' should always list
+   `ascii' and `control-1' last, in case the charset provides its
+   own Unicode mapping for ASCII chars (e.g. VISCII does).
+
+-- Check to make sure that `multibyte' will handle gaps in the charsets,
+   as indicated by the attached unicode maps, e.g. VISCII has this.
+   Internal ASCII character should get mapped out to ASCII whenever
+   there's not a shadowing entry in the VISCII map, e.g. internal
+   0x02 should *NOT* get mapped to external 0x02 because that is
+   already used by some other character.
+
+-- Fix up the COPYING and README files in etc/unicode/unicode-consortium.
+
+-- Aidan has added a whole lot of CCL-related Unicode-handling stuff.
+   Need to figure out what this is doing and implement this in a way
+   that will work with Unicode-internal.
+
+-- Error-handling:
+
+   1. We need to review all uses of CONVERR_SUCCEED, CONVERR_USE_PRIVATE
+      and CONVERR_SUBSTITUTE and see whether it is the right thing to do.
+   2. Review and document the use of private Unicode codepoints.  The point
+   here seems to be that in some circumstances, we want to be able to
+   represent charset codepoints as Unicode codepoints even when no official
+   conversion exists, e.g. when using the Unicode-internal representation
+   and reading in and/or recompiling a .el file containing weird charset
+   codepoints without Unicode equivalents.  Currently we create such
+   codepoints when CONVERR_SUCCEED is given, but we need to review this.
+   We also need to review calls to valid_unicode_codepoint_p() and check
+   the second argument, which determines whether private Unicode codepoints
+   are allowed.
+
+-- Redisplay:
+
+   1. In descr-text.el, the junky function describe-char-display wants to
+      figure out the font code that will be displayed.  It used
+      ccl-encode-to-ucs-2, which I removed; now, X does Unicode in a
+      more general way.  But try to fix this.
+
+-- "Windows Glyph List 4":
+   -- Hacky code in unicode.el forces certain glyphs to go into
+      a JIT charset for display purposes.  This seems awful and won't work
+      for Unicode-internal.  Instead, create a one-column charset holding
+      these codepoints and put it above the East-Asian charsets.
+
+CCL:
+
+   -- Rather than disallowing CCL under Unicode-internal, allow it and just
+      make it work as well as possible.  It may never work perfectly but
+      at least it can do something.
+
+LONGER-TERM TO DO:
+==================
+
+-- Multibyte coding system:
+
+   1. Implement coding priorities for this coding system.  Currently in
+   coding-system-priority we either return `no-conversion' or `iso-8-1',
+   both of which are garbage.
+
+   2. A detector for multibyte.  It might have subtypes, based on identifiable
+   characteristics: single-byte with high bytes in A0-FF range, same but
+   with 80-FF range, double-byte (i.e. mixed single-double) with both bytes
+   A0-FF, double-byte with first byte A0-FF and second byte either high or
+   low, double-byte with first byte 80-FF and second byte either high or
+   low, double-byte with single-byte in A0-DF range, first double-byte in
+   80-9F and E0-FF and second in either high or low.  We determine whether
+   things are single or double by looking to see whether only even stretches
+   or presumed doubles exist.  Determining what kind of second byte occurs
+   is simple, too.
+
+   NOTE: The latter detector described is really a shift-jis-specific
+   detector, in disguise.  We really need to have the multibyte have a list
+   of all the systems it's trying to classifying something into, and output
+   probabilities for each of these different possible classifications,
+   based on the same sort of things it does above.  But rather than have to
+   hard code a set of subtypes for different ranges of allowed single or
+   double-byte characters, it can do this dynamically.
+
+-- Unicode precedence lists:
+
+   1. Implement a buffer-local unicode-precedence list and then review all
+   uses of get_unicode_precedence() to see how to fix them up.  Sometimes you
+   may need the caller to pass in a buffer or domain.
+   2. We may want to convert precedence lists into structures, where we can
+   cache info about the structure, e.g. whether we can short-circuit involving
+   ASCII (see below under "Unicode-encoding"), and also whether this is a
+   global or buffer-local list or a freshly created one, so we know how to
+   free it properly.
+
+
+-- Unicode encoding:
+
+   -- Think about whether we really want to be hardcoding translations
+      involving ASCII and Control-1.  Some charsets specify different values
+      for codes in the ASCII range, e.g. Vietnamese VISCII, Japanese-Roman,
+      and of course EBCDIC.  Perhaps the check for ASCII is for
+      efficiency; if so there should be a way of indicating that ASCII is
+      higher than any other charsets that have different values for those
+      bytes.  Perhaps we want to convert precedence lists into structures,
+      where we can cache info about the lists.  (See above, under "Unicode
+      precedence lists")
+
+-- Things like `modify-syntax-entry' and friends that operate on charset-
+   specific properties need to do things differently.  We really need to
+   get the big table of properties from the Unicode consortium, indicating
+   all the important properties of Unicode characters, and parse this and
+   use it to set the properties of the characters.
+
+-- Currently, when in old_mule_non_ascii_charset_codepoint_to_ichar_raw(),
+   we get a non-encodable charset point, we just punt, and ultimately
+   either you get an error or a generic substitution character.  We should
+   instead try to find an equivalent character by way of Unicode.  In this,
+   we need a property on charsets listing the preferred charsets to map
+   this charset into.  Without such a listing of preference, it might just
+   work to use whatever preference list happens to be available; but there
+   is no guarantee.  This leads to behavior identical to the old CCL
+   behavior; that is, we can ensure that koi8-r gets converted into
+   iso8859-5 to the extent it can be.  With implementation of the multibyte
+   decoder, we could remove shift-jis and big5 as special cases except
+   under old-Mule.
+
+-- In order for someone under Unicode-internal to be able to edit a Mule
+   .el file reliably, they have to use iso2022-preservation to read it in
+   and write it out.  But currently such private characters don't function
+   like normal ones.  We have to make them function the same.  First,
+   provide a function that converts a private character to its normal
+   Unicode one by converting it to charset codepoint and from there to
+   normal Unicode.  Then, we use this conversion when processing the
+   character in various contexts.  These contexts include, among others,
+   retrieving a value from a char table (e.g. a case-table conversion, syntax-
+   table handling, etc.) and handling the character for redisplay.  Probably,
+   we want to do change the conversion routines so that most routines, when
+   they ask for a character, the automatically get the converted version if
+   it exists.  Special routines would be needed to get the original, private
+   version.  String comparisons would be tricky since you'd probably want to
+   compare the converted versions of one string with the converted versions
+   of another string; but we already have stuff in place to handle translation
+   tables and the logic is very similar.
+
+-- The same scheme of private characters could be used for the preservation
+   of other info such as language.  However, the normal scheme that is expected
+   to be used in such a case is text properties.  In order to use this for real,
+   we need (a) all operations on strings need to preserve text properties.
+   Most currently do, but many don't. (b) there needs to be a way to serialize
+   text properties into a string of data, as is passed to conversion routines.
+   A simple way would be to make the data be structured; e.g. a special value
+   indicates that a structure describing text properties follows; it indicates
+   the text properties of further data until another such structure occurs.
+   The introducing marker could either be something that could not have normal
+   meaning, or something that could but doesn't occur often in that meaning.
+   To indicate the normal meaning, you'd then double the value.  The list of
+   text properties would have to be serialized into a byte stream, something
+   like in unified range tables.
+
+-- It would be important, whether or not the scheme about serialized
+   text-property info as indicated previously is adopted, to make sure that
+   all character data passed to conversion routines occurs in groups of
+   full characters. (Note, data is conversion routines is already typed
+   according to byte or char.) This would significantly speed up the
+   operations of these routines.  This would be fairly simple to ensure,
+   and the mechansism for this is already in place.
+
+-- Font handling:
+
+   Currently, redisplay assumes one font per character set
+   and does optimizations based on this.  The face_cachel info explicitly
+   encodes a table mapping charsets to fonts.  We need to remove this
+   entirely.  You might think of mapping Unicode ranges to fonts, but this
+   suffers the same problem that fonts may cover part of various ranges.
+   So:
+
+   a. First, we need to do mapping entirely based on characters, not
+      character sets.  This means, e.g., changing the MATCHSPEC of
+      specifier-matching-instance of a font-specifier to have a character
+      not a charset in it.
+   b. Second, we will need some caching to make this process reasonable.
+      Previously there were at least two levels of caching: (1) Redisplay
+      only went investigating for a new font when the charset changed after
+      a run of characters in the same charset. (2) Font lookup was cached.
+      What we need to do is (a) We could cache for each character the font
+      found for that character; unfortunately that can easily change based
+      on the language, or simply when the user chooses to change the list of
+      font used; (b) Cache the tables for each font that specify what characters
+      exist (this is probably the biggest savings, as long as the tables we
+      cache aren't horrendously large); some tables might be imperfect, e.g.
+      indicating only the unicode ranges they support, and we might have to
+      query further; maybe this means they tend to support the entire subrange,
+      and we'll have to assume that and optimize the routine for this.
+   c. To find a matching font, we currently make use of the charset of the
+      char and the registry of the font.  In the revised scheme, we'd make
+      use of the language, if considered important, to convert to a charset
+      appropriate to the language and choose the font by its registry.  We
+      may want to do this always, but whether we do this first or after
+      seeking out an ISO-10646 (Unicode) font, is something controlled by
+      the particular language (e.g. by a global list specifying which
+      languages do language-specific lookup first).
+
+
+-- A better interface for implementing query-coding-system, etc.
+
+The current interface underlying query-coding-system (the query methods on
+coding systems) seems unclean -- it should work with streams, not buffers,
+and shouldn't have to muck around with highlighting extents or anything of
+that sort.  That should be handled at a higher level.  But also, you're
+duplicating a lot of logic by having separate query and convert methods.
+I'd rather see an interface like this:
+
+1. Add three more parameters to the convert methods, which are pointers
+   used to return an error code, input length and output length, see below.
+2. Currently, when a convert method encounters an error, it does the best
+   job it can at converting, writes out the result to the dynarr, and just
+   continues, without alerting the caller.  In the new scheme, when the
+   routine encounters an error, it does the following:
+-- (a) store the amount of input data processed so far (not including the
+       erroneous input) in the "input length" parameter.
+-- (b) call Dynarr_length() and store the result into the "output length"
+       parameter.
+-- (c) write out its "best attempt at converting" to the dynarr.
+-- (d) stores an error into the "error code" parameter.
+-- (e) returns immediately.  The return value, which indicates how much
+       input has been processed, should include the erroneous input.
+3. Note that the actual return value from the convert method, as mentioned,
+   indicates the total amount of input processed, and the total amount of
+   output generated can be determined by comparing the length of the dynarr
+   before and after the call.  The "input length" and "output length"
+   parameters indicate how much non-erroneous input and output was
+   processed.
+4. Note that there is no need for either the coding-stream convert method
+   or the lstream write method to process all data given -- if some data is
+   left, it will be passed in again on the next call. (Lstream_write_1()
+   takes care of this.) Hence it's safe for the conversion method to stop
+   immediately upon an error.
+5. The existing coding_reader() and coding_writer() methods will simply
+   ignore the values returned via the new parameters.
+6. Create new lstream methods, e.g. read_from_converter() and
+   write_to_converter(), that can access the additional error info returned
+   by the convert methods.  It's likely that these methods will be provided
+   only for coding lstreams.  The coding stream versions will be
+   modifications of the existing coding_reader() and coding_writer().  As a
+   first pass, you can just provide write_to_converter(), since
+   coding_writer() is very simple but coding_reader() is a bit tricky.
+7. Create a new lstream function, e.g. Lstream_write_to_converter(), that
+   wraps the write_to_converter() method.  Unlike Lstream_write(), this
+   doesn't attempt to stuff *all* the data down the stream, and doesn't
+   buffer.  Instead, it only makes one call to the write_to_converter()
+   method and returns its results directly.  It's the caller's job to loop.
+   Lstream_read_from_converter() is similar.
+8. `query-coding-*' simply sets up a coding lstream writing out to a dynarr
+   lstream sink, or maybe a null lstream sink (doesn't currently exist but
+   it's easy to implement -- just copy and modify the existing "fixed
+   buffer lstream").  It writes out its data using write_to_converter(),
+   and it will be told whenever the coding system ran into problems, along
+   with the amount of data processed without problems and the first bit of
+   data that caused problems.  If you want, you can also retrieve the
+   result of converting, both of the good and bad chunks.  By looping, you
+   can accumulate a list of all the bad sections.
+9. This same mechanism can also be used when reading a file in to determine
+   whether the file was converted correctly, and to flag sections that were
+   not correctly converted.
+
+
+
+LIST OF CHANGES MADE:
+====================
+
+
+Configure changes:
+
+  configure.ac, config.h.in, nt/config.inc.samp, nt/xemacs.mak
+    -- add with-unicode-internal as option
+
+wcwidth:
+  configure.ac, config.h.in:
+    -- add check for `wcwidth' function (AC_CHECK_FUNCS), HAVE_WCWIDTH
+  unicode.c:
+    -- unicode_char_columns(): use wcwidth() if available, else hand-code
+       for most known charsets
+
+ISO-2022-Preserving:
+  bytecode.el:
+    -- When inserting a file to be compiled, look for a coding-system
+       magic cookie; if it's an `iso2022' coding system, override the coding
+       system with iso-2022-8bit-preserve, so that ISO 2022 sequences get
+       preserved with their same charsets. (FIXME: Shouldn't this use
+       text properties on the characters?)
+  mule/mule-coding.el:
+    -- Create iso-2022-8bit-preserve coding-system.
+  mule/mule-{ccl,charset,cmds,coding,composite-stub,tty-init,x-init}.el,
+  unicode.el:
+    -- Don't mark as using iso-2022-7bit encoding.
+derived.el:
+  -- Some simplification of a call to map-char-table checking for inherited
+     syntax codes (WHY? Maybe internally we fixed map-char-table to do a
+     better job?)
+dumped-lisp.el:
+  -- Move unicode.el later in build list (WHY?)
+  -- some comment formatting
+  -- Add mule/windows.el. (WHY? What goes here?)
+
+Debugging, Error-checking:
+  debug-on-error
+    loadup.el:
+      -- when --debug enabled, immediately dump core upon an unhandled error
+         during loadup, instead of just quitting. (NOTE: You can also get this
+         efect using export XEMACSDEBUG='(setq debug-on-error t)' (STANDALONE)
+    cmdloop.c, lisp.h:
+      -- if debug-on-error set, immediately break to the debugger
+         (force-debugging-signal). (STANDALONE)
+  debug.c:
+    provide `debug-xemacs' if debugging enabled. (STANDALONE)
+  emacs.c, lisp.h:
+    use const in debug_can_access_memory (STANDALONE)
+  print.c:
+    try harder to detect when printing would cause a core dump, and
+    instead print a huge warning ("SERIOUS XEMACS BUG") (STANDALONE)
+  config.h.in:
+    define ERROR_CHECK_ANY if any error-checking turned on (WHY?)
+  casetab.c, console.c, data.c, database.c, device-msw.c, device.c,
+  eval.c, file-coding.c, frame.c, glyphs.c, gui.c, keymap.c, lisp.h,
+  mule-charset.c, objects.c, print.c, process.c, tooltalk.c, ui-gtk.c,
+  window.c, ...:
+    -- use printing_unreadable_lcrecord instead of printing_unreadable_object
+    (STANDALONE)
+  lisp.h:
+    -- define text_checking_assert() and other *_checking_assert() macros
+       for various types of error-checking
+
+
+Unicode translation tables:
+  loadup.el:
+    -- delete code to load the unicode translation tables at dump time.
+       (WHY?)
+  mule/mule-cmds.el:
+    -- comment out code to load the unicode translation tables when not
+       at dump time. (WHY?)
+ 
+mule/arabic.el, mule/chinese.el, mule/cyrillic.el, mule/ethiopic.el, mule/european.el, mule/greek.el,
+mule/hebrew.el, mule/indian.el, mule/japanese.el, mule/korean.el, mule/lao.el, mule/misc-lang.el,
+mule/mule-charset.el, mule/thai-xtis.el, mule/mule-thai.el, mule/tibetan.el, mule/vietnamese.el, ...:
+  -- move creation of charsets into mule/mule-charset.el, necessary for bootstrap
+     reasons: we need the charsets created, and their translation tables
+     loaded, BEFORE loading any files that use the charsets in their
+     ISO-2022 encoding.
+mule/chinese.el:
+  -- Frob syntax table entries for big5 charset(s) -- see below.
+     (FIXME: Where was this done before?)
+  -- Fix regexp for identifying a Chinese-simplified locale. (STANDALONE)
+mule/cyrillic.el, mule/devan-util.el, mule/devanagari.el, mule/european.el, mule/greek.el, mule/japanese.el, mule/lao.el, mule/thai-xtis.el, ...:
+  -- Replace raw ISO-2022 sequences with calls to make-char. (FIXME: Is this
+     necessary? I vaguely remember that I went through and changed some
+     files like this, then maybe figured out a way to keep ISO-2022 encoding in
+     loadup lisp files.)
+
+CCL changes:
+  mule/cyrillic.el:
+    -- Delete CCL coding system for converting Koi8 to ISO-8859-5; instead
+       define `koi8' as a `multibyte' coding system.
+    -- Alternativny encoding: Same changes as for Koi8, same caveats.
+
+src/Makefile.in.in:
+  -- minor cleanup: use mule_objs instead of mule_wnn_objs, mule_canna_objs.
+
+mule/ethio-util.el:
+  -- Avoid raw use of \u in a string (STANDALONE).
+mule/european.el:
+  -- Add comments about adding support for Latin-10, Latin-13. FIXME:
+     Support is not there yet, marked with @@####.
+mule/european.el, mule/greek.el:
+  -- There are gaps in the Latin-3 and Latin-7 unicode tables, so handle
+     this when setting up the syntax entries.
+mule/mule-charset.el:
+  -- Add charsets for code pages.
+mule/mule-composite.el:
+  -- Don't autoload fns in this file.
+mule/windows.el:
+  -- Currently empty.
+  -- FIXME: Should have Windows-1251 support (see comment in cyrillic.el)
+     and others? 
+x-compose.el:
+  -- @@#### comment about eliminating this file.
+
+mirror tables, syntax tables, case tables:
+  Maintaining mirror tables is difficult with changes to char tables, so
+  put the code inside of #ifdef MIRROR_TABLE and don't use for the moment
+    abbrev.c, casefiddle.c, cmds.c, search.c:
+      -- use BUFFER_MIRROR_SYNTAX_TABLE in place of buf->mirror_syntax_table
+    buffer.c, bufslots.h, chartab.c, chartab.h, syntax.c, syntax.h:
+      -- conditionalize mirror-table usage on MIRROR_TABLE
+  buffer.h, syntax.h:
+    -- move BUFFER_SYNTAX_TABLE, BUFFER_MIRROR_SYNTAX_TABLE,
+       BUFFER_CATEGORY_TABLE to syntax.h from buffer.h
+    -- syntax.h needs to include buffer.h
+  buffer.h, casetab.h:
+    -- move case-table-related code from buffer.h to casetab.h
+  abbrev.c, buffer.h, casefiddle.c, casetab.c, casetab.h, fns.c, gui-msw.c,
+  gui.c, menubar-msw.c, menubar.c, minibuf.c, process.c, regex.c, search.c,
+  text.c, win32.c:
+    -- don't include casetab.h/chartab.h in buffer.h; do in individual .c
+       files (instead, casetab.h includes buffer.h and chartab.h)
+  chartab.c, chartab.h:
+    -- drastic rewrite of char tables for Unicode-internal: previous
+       implementation indexed by charsets, current one by individual chars,
+       using page tables (same as for Unicode translation tables)
+  cmds.c:
+    -- use put_char_table_1() to modify a char table instead of trying to
+       hack into the actual structure
+  casetab.c, chartab.c, syntax.c:
+    -- change prototype of internal mapper for mapping over a char table to
+       take a character rather than an object that may indicate a range.
+
+Characters, charsets:
+  Characters change format, are Unicode instead of encoding a charset.
+  No way to retrieve a charset directly from a character any more.  Instead
+  of `split-char' into a charset, use `char-to-charset-codepoint' to look
+  into a translation table.  Leading bytes disappear.  Character sets
+  can be an arbitrary one or two-dimensional size -- not restricted to
+  94 or 96 or 94x94 or 96x96.  Thus, big5 is one charset now instead of
+  two.
+
+  mule/vietnamese.el:
+    -- Use char-to-charset-codepoint instead of split-char.
+
+  mule/chinese.el, mule/mule-category.el, mule/mule-charset.el, mule/mule-msw-init-late.el:
+    -- with unicode-internal, big5 is one charset instead of 2, so
+       conditionalize code that uses big5 charset(s) to work both ways, looking
+       for (find-charset 'chinese-big5-1).
+  charset.h:
+    -- additional commenting of charset struct elements
+
+alloc.c:
+  -- comment reformatting
+  -- change hash method of cons from 0 to internal_hash_1 (WHY?); a comment
+     says the same thing for string, but the change hasn't be made (WHY? FIXME?)
+  -- factor out some repeated code in object_memory_usage_stats into
+     a FROB() (STANDALONE)
+Dynarrs:
+  alloc.c, bytecode.c, dynarr.c:
+    -- move def of lisp object desc to dynarr.c
+    -- capitalize: lisp_object_description -> Lisp_Object_description
+    -- factor out code into mark_Lisp_Object_dynarr()
+  device-x.c:
+    Dynarr_add_validified_lisp_string(): do nothing if size is 0.
+chartab.c, search.c, ...:
+  -- clean up order of includes (STANDALONE)
+
+compiler.h:
+  -- add USED_IF_UNICODE_INTERNAL, USED_IF_MULE_NOT_UNICODE_INTERNAL
+console-impl.h, device-impl.h:
+  -- random ~~#### comments
+
+... unfinished ...
+
+STILL TO DO: unicode.el,
+  man/internals/internals.texi, nt/config.inc.samp, nt/xemacs.mak,
+  src/Makefile.in.in, src/charset.h - zzz.[ch]
+
+unicode:
+
+(mostly same as unicode-premerge but some code was missed and some code
+ got accidentally duplcated; FIXME was anything added?)
+
 with_native_sound_lib
 enable_mule
 with_mule
+enable_unicode_internal
+with_unicode_internal
 enable_xim
 with_xim
 enable_canna
 Internationalization options
 ----------------------------
 
-  --with-mule             Compile with Mule (Multi-Lingual Emacs) support,
-                          needed to support non-Latin-1 (including Asian)
-                          languages.
+  --with-mule             Compile with Mule (Multi-Lingual Emacs), i.e.
+                          internationalization support, needed to support
+                          non-Latin-1 (including Asian) languages.
+  --with-unicode-internal Use Unicode and UTF-8 internally for text.
+                          Automatically selects `--with-mule'. Without this,
+                          Mule uses an old internal encoding that explicitly
+                          encodes the national character set associated with a
+                          char and cannot represent all of Unicode.
   --with-xim==TYPE        Enable XIM support. TYPE is `yes', `no', `xlib', or
                           `motif'
   --with-canna            Support the Canna Japanese input method. Requires
   withval="$with_mule"
 
 fi;
+# If --with-unicode-internal or --without-unicode-internal were given then copy the value to the
+# equivalent enable_unicode-internal variable.
+if test "${with_unicode_internal+set}" = set; then
+  enable_unicode_internal="$with_unicode_internal"
+fi;
+# If -enable-unicode-internal or --disable-unicode-internal were given then copy the value to the
+# equivalent with_unicode-internal variable.
+if test "${enable_unicode_internal+set}" = set; then
+  with_unicode_internal="$enable_unicode_internal"
+fi;
+# Check whether --with-unicode-internal or --without-unicode-internal was given.
+if test "${with_unicode_internal+set}" = set; then
+  enableval="$with_unicode_internal"
+  withval="$with_unicode_internal"
+
+fi;
 
 # If --with-xim or --without-xim were given then copy the value to the
 # equivalent enable_xim variable.
 
 
 
+if test "$with_unicode_internal" = "yes"; then
+  test "$with_mule" = "no" && \
+	(echo "$progname: Usage error:"
+echo " " "Cannot give \`--with-mule=no' and \`--with-unicode-internal=yes'.
+Unicode-internal requires Mule."
+echo "  Use \`$progname --help' to show usage.") >&2 && exit 1
+  test "$with_mule" != "yes" && \
+	{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: \`--with-mule' forced to \`yes': Unicode-internal requires Mule." >&5
+$as_echo "$as_me: WARNING: \`--with-mule' forced to \`yes': Unicode-internal requires Mule." >&2;}
+  with_mule=yes
+  $as_echo "#define UNICODE_INTERNAL 1" >>confdefs.h
+
+fi
+
 test -z "$with_mule" && with_mule=no
 
 
 esac
 
 
-for ac_func in cbrt closedir dup2 eaccess fmod fpathconf frexp fsync ftime ftruncate getaddrinfo gethostname getnameinfo getpagesize getrlimit gettimeofday getcwd link logb lrand48 matherr mkdir mktime perror poll random readlink rename res_init rint rmdir select setitimer setpgid setsid sigblock sighold sigprocmask snprintf strerror strlwr strupr symlink tzset ulimit umask usleep vlimit vsnprintf waitpid wcscmp wcslen
+for ac_func in cbrt closedir dup2 eaccess fmod fpathconf frexp fsync ftime ftruncate getaddrinfo gethostname getnameinfo getpagesize getrlimit gettimeofday getcwd link logb lrand48 matherr mkdir mktime perror poll random readlink rename res_init rint rmdir select setitimer setpgid setsid sigblock sighold sigprocmask snprintf strerror strlwr strupr symlink tzset ulimit umask usleep vlimit vsnprintf waitpid wcscmp wcslen wcwidth
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
 
 echo "
 Internationalization:"
-test "$with_mule" = yes && echo "  Compiling in support for Mule (multi-lingual Emacs)."
+if test "$with_mule" = yes; then
+  echo "  Compiling in support for Mule (multi-lingual Emacs)."
+  test "$with_unicode_internal" = yes && echo "    -  Unicode and UTF-8 will be used internally for text."
+  test "$with_unicode_internal" != yes && echo "    -  The old-Mule format will be used internally for text."
+fi
 test "$with_xim" != no && echo "  Compiling in support for XIM (X11R5+ I18N input method)."
 test "$with_xim" = motif && echo "    - Using Motif to provide XIM support."
 test "$with_xim" = xlib && echo "    - Using raw Xlib to provide XIM support."
 dnl
 XE_HELP_SUBSECTION([Internationalization options])
 XE_MERGED_ARG([mule],
-	AS_HELP_STRING([--with-mule],[Compile with Mule (Multi-Lingual Emacs) support,
+	AS_HELP_STRING([--with-mule],[Compile with Mule (Multi-Lingual Emacs), i.e. internationalization support,
                         needed to support non-Latin-1 (including Asian)
                         languages.]),
 	[], [])
+XE_MERGED_ARG([unicode-internal],
+	AS_HELP_STRING([--with-unicode-internal],[Use Unicode and UTF-8 internally for text.  Automatically selects `--with-mule'.  Without this,
+Mule uses an old internal encoding that explicitly encodes the national
+character set associated with a char and cannot represent all of Unicode.]),
+	[], [])
 XE_KEYWORD_ARG([xim],
 	AS_HELP_STRING([--with-xim==TYPE],[Enable XIM support.  TYPE is `yes', `no', `xlib', or `motif']),
 	[],[],[yes,no,xlib,motif])dnl
 dnl Mule-dependent options
 dnl ----------------------
 
+if test "$with_unicode_internal" = "yes"; then
+  test "$with_mule" = "no" && \
+	USAGE_ERROR("Cannot give \`--with-mule=no' and \`--with-unicode-internal=yes'.
+Unicode-internal requires Mule.")
+  test "$with_mule" != "yes" && \
+	AC_MSG_WARN([\`--with-mule' forced to \`yes': Unicode-internal requires Mule.])
+  with_mule=yes
+  AC_DEFINE(UNICODE_INTERNAL)
+fi
+
 test -z "$with_mule" && with_mule=no
 
 dnl if test "$with_mule" = "yes" && test ! -d "$srcdir/lisp/mule"; then
 dnl Check for POSIX functions.
 dnl ----------------------------------------------------------------
 
-AC_CHECK_FUNCS(cbrt closedir dup2 eaccess fmod fpathconf frexp fsync ftime ftruncate getaddrinfo gethostname getnameinfo getpagesize getrlimit gettimeofday getcwd link logb lrand48 matherr mkdir mktime perror poll random readlink rename res_init rint rmdir select setitimer setpgid setsid sigblock sighold sigprocmask snprintf strerror strlwr strupr symlink tzset ulimit umask usleep vlimit vsnprintf waitpid wcscmp wcslen)
+AC_CHECK_FUNCS(cbrt closedir dup2 eaccess fmod fpathconf frexp fsync ftime ftruncate getaddrinfo gethostname getnameinfo getpagesize getrlimit gettimeofday getcwd link logb lrand48 matherr mkdir mktime perror poll random readlink rename res_init rint rmdir select setitimer setpgid setsid sigblock sighold sigprocmask snprintf strerror strlwr strupr symlink tzset ulimit umask usleep vlimit vsnprintf waitpid wcscmp wcslen wcwidth)
 
 dnl getaddrinfo() is borked under hpux11
 if test "$ac_cv_func_getaddrinfo" != "no" ; then
 
 echo "
 Internationalization:"
-test "$with_mule" = yes && echo "  Compiling in support for Mule (multi-lingual Emacs)."
+if test "$with_mule" = yes; then
+  echo "  Compiling in support for Mule (multi-lingual Emacs)."
+  test "$with_unicode_internal" = yes && echo "    -  Unicode and UTF-8 will be used internally for text."
+  test "$with_unicode_internal" != yes && echo "    -  The old-Mule format will be used internally for text."
+fi
 test "$with_xim" != no && echo "  Compiling in support for XIM (X11R5+ I18N input method)."
 test "$with_xim" = motif && echo "    - Using Motif to provide XIM support."
 test "$with_xim" = xlib && echo "    - Using raw Xlib to provide XIM support."
 	* custom/example-themes/europe-theme.el:
 	Correct FSF address in permission notice.
 
+2010-03-07  Ben Wing  <ben@xemacs.org>
+
+	* unicode/unicode-consortium/EASTASIA/OBSOLETE/JIS0208.TXT:
+	0x2140 mapped to 005C REVERSE SOLIDUS (i.e. backslash) instead of
+	FF3C FULLWIDTH REVERSE SOLIDUS, which screwed up reading in of
+	quail/greek.el.
+
 2010-02-22  Ben Wing  <ben@xemacs.org>
 
 	* dbxrc.in:

etc/unicode/README

 This directory contains Unicode translation tables for most of the
 charsets in XEmacs.
 
+The tables in most of the directories are in a standard format that can be
+read by the function `load-unicode-mapping-table'.  References to these
+files are coded into files in the `lisp' and `lisp/mule' subdiectories of
+the source tree, such as `unicode.el' and various writing-system-specific
+files (e.g. `mule/latin.el', `mule/cyrillic.el').  The tables in `ibm/' are
+not currently in the right format, but could easily be converted by an
+appropriate perl or Elisp script.
+
+Sometimes the same character set has more than one table in different
+directories.  Sometimes one table is more complete than others, e.g. the
+tables for CNS 11643 in `mule-ucs' have mappings for non-BMP (greater than
+0xFFFF) Unicode characters, while the ones in `unicode-consortium' don't.
+It's possible to use one table to check the accuracy of another, although
+we don't currently do that.  We tend to prefer the tables in
+`unicode-consortium' over others unless there's a good reason to do
+otherwise (e.g. as previously described for CNS 11643).
+
 The tables in unicode-consortium/ come from:
 
 http://www.unicode.org/Public/MAPPINGS/
 
 http://oss.software.ibm.com/icu/charset/
 
-The tables in unicode-consortium/ should be used as source data; the ones
-in ibm/ can be used to supplement or check the accuracy of the others.
+The tables in libiconv/ come from the `tests' subdirectory of
+libiconv-1.13.1 (http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.13.1.tar.gz).
+
+The tables in oreilly/ come from:
+
+http://examples.oreilly.com/cjkvinfo/unicode/
+
+The tables in mule-ucs/ come from the mule-ucs/lisp/reldata directory in
+the mule-ucs package, available from the XEmacs download repository under
+
+http://www.xemacs.org/Releases/index.html#Packages
+
+The tables in other/ were generated in various ways described in the
+README file in that directory.

etc/unicode/libiconv/ARMSCII-8.IRREVERSIBLE.TXT

+0x28	0x0028
+0x29	0x0029
+0x2C	0x002C
+0x2D	0x002D
+0x2E	0x002E

etc/unicode/libiconv/ARMSCII-8.TXT

+0x00	0x0000
+0x01	0x0001
+0x02	0x0002
+0x03	0x0003
+0x04	0x0004
+0x05	0x0005
+0x06	0x0006
+0x07	0x0007
+0x08	0x0008
+0x09	0x0009
+0x0A	0x000A
+0x0B	0x000B
+0x0C	0x000C
+0x0D	0x000D
+0x0E	0x000E
+0x0F	0x000F
+0x10	0x0010
+0x11	0x0011
+0x12	0x0012
+0x13	0x0013
+0x14	0x0014
+0x15	0x0015
+0x16	0x0016
+0x17	0x0017
+0x18	0x0018
+0x19	0x0019
+0x1A	0x001A
+0x1B	0x001B
+0x1C	0x001C
+0x1D	0x001D
+0x1E	0x001E
+0x1F	0x001F
+0x20	0x0020
+0x21	0x0021
+0x22	0x0022
+0x23	0x0023
+0x24	0x0024
+0x25	0x0025
+0x26	0x0026
+0x27	0x0027
+0x28	0x0028
+0x29	0x0029
+0x2A	0x002A
+0x2B	0x002B
+0x2C	0x002C
+0x2D	0x002D
+0x2E	0x002E
+0x2F	0x002F
+0x30	0x0030
+0x31	0x0031
+0x32	0x0032
+0x33	0x0033
+0x34	0x0034
+0x35	0x0035
+0x36	0x0036
+0x37	0x0037
+0x38	0x0038
+0x39	0x0039
+0x3A	0x003A
+0x3B	0x003B
+0x3C	0x003C
+0x3D	0x003D
+0x3E	0x003E
+0x3F	0x003F
+0x40	0x0040
+0x41	0x0041
+0x42	0x0042
+0x43	0x0043
+0x44	0x0044
+0x45	0x0045
+0x46	0x0046
+0x47	0x0047
+0x48	0x0048
+0x49	0x0049
+0x4A	0x004A
+0x4B	0x004B
+0x4C	0x004C
+0x4D	0x004D
+0x4E	0x004E
+0x4F	0x004F
+0x50	0x0050
+0x51	0x0051
+0x52	0x0052
+0x53	0x0053
+0x54	0x0054
+0x55	0x0055
+0x56	0x0056
+0x57	0x0057
+0x58	0x0058
+0x59	0x0059
+0x5A	0x005A
+0x5B	0x005B
+0x5C	0x005C
+0x5D	0x005D
+0x5E	0x005E
+0x5F	0x005F
+0x60	0x0060
+0x61	0x0061
+0x62	0x0062
+0x63	0x0063
+0x64	0x0064
+0x65	0x0065
+0x66	0x0066
+0x67	0x0067
+0x68	0x0068
+0x69	0x0069
+0x6A	0x006A
+0x6B	0x006B
+0x6C	0x006C
+0x6D	0x006D
+0x6E	0x006E
+0x6F	0x006F
+0x70	0x0070
+0x71	0x0071
+0x72	0x0072
+0x73	0x0073
+0x74	0x0074
+0x75	0x0075
+0x76	0x0076
+0x77	0x0077
+0x78	0x0078
+0x79	0x0079
+0x7A	0x007A
+0x7B	0x007B
+0x7C	0x007C
+0x7D	0x007D
+0x7E	0x007E
+0x7F	0x007F
+0x80	0x0080
+0x81	0x0081
+0x82	0x0082
+0x83	0x0083
+0x84	0x0084
+0x85	0x0085
+0x86	0x0086
+0x87	0x0087
+0x88	0x0088
+0x89	0x0089
+0x8A	0x008A
+0x8B	0x008B
+0x8C	0x008C
+0x8D	0x008D
+0x8E	0x008E
+0x8F	0x008F
+0x90	0x0090
+0x91	0x0091
+0x92	0x0092
+0x93	0x0093
+0x94	0x0094
+0x95	0x0095
+0x96	0x0096
+0x97	0x0097
+0x98	0x0098
+0x99	0x0099
+0x9A	0x009A
+0x9B	0x009B
+0x9C	0x009C
+0x9D	0x009D
+0x9E	0x009E
+0x9F	0x009F
+0xA0	0x00A0
+0xA2	0x0587
+0xA3	0x0589
+0xA4	0x0029
+0xA5	0x0028
+0xA6	0x00BB
+0xA7	0x00AB
+0xA8	0x2014
+0xA9	0x002E
+0xAA	0x055D
+0xAB	0x002C
+0xAC	0x002D
+0xAD	0x058A
+0xAE	0x2026
+0xAF	0x055C
+0xB0	0x055B
+0xB1	0x055E
+0xB2	0x0531
+0xB3	0x0561
+0xB4	0x0532
+0xB5	0x0562
+0xB6	0x0533
+0xB7	0x0563
+0xB8	0x0534
+0xB9	0x0564
+0xBA	0x0535
+0xBB	0x0565
+0xBC	0x0536
+0xBD	0x0566
+0xBE	0x0537
+0xBF	0x0567
+0xC0	0x0538
+0xC1	0x0568
+0xC2	0x0539
+0xC3	0x0569
+0xC4	0x053A
+0xC5	0x056A
+0xC6	0x053B
+0xC7	0x056B
+0xC8	0x053C
+0xC9	0x056C
+0xCA	0x053D
+0xCB	0x056D
+0xCC	0x053E
+0xCD	0x056E
+0xCE	0x053F
+0xCF	0x056F
+0xD0	0x0540
+0xD1	0x0570
+0xD2	0x0541
+0xD3	0x0571
+0xD4	0x0542
+0xD5	0x0572
+0xD6	0x0543
+0xD7	0x0573
+0xD8	0x0544
+0xD9	0x0574
+0xDA	0x0545
+0xDB	0x0575
+0xDC	0x0546
+0xDD	0x0576
+0xDE	0x0547
+0xDF	0x0577
+0xE0	0x0548
+0xE1	0x0578
+0xE2	0x0549
+0xE3	0x0579
+0xE4	0x054A
+0xE5	0x057A
+0xE6	0x054B
+0xE7	0x057B
+0xE8	0x054C
+0xE9	0x057C
+0xEA	0x054D
+0xEB	0x057D
+0xEC	0x054E
+0xED	0x057E
+0xEE	0x054F
+0xEF	0x057F
+0xF0	0x0550
+0xF1	0x0580
+0xF2	0x0551
+0xF3	0x0581
+0xF4	0x0552
+0xF5	0x0582
+0xF6	0x0553
+0xF7	0x0583
+0xF8	0x0554
+0xF9	0x0584
+0xFA	0x0555
+0xFB	0x0585
+0xFC	0x0556
+0xFD	0x0586
+0xFE	0x055A

etc/unicode/libiconv/ASCII.TXT

+0x00	0x0000
+0x01	0x0001
+0x02	0x0002
+0x03	0x0003
+0x04	0x0004
+0x05	0x0005
+0x06	0x0006
+0x07	0x0007
+0x08	0x0008
+0x09	0x0009
+0x0A	0x000A
+0x0B	0x000B
+0x0C	0x000C
+0x0D	0x000D
+0x0E	0x000E
+0x0F	0x000F
+0x10	0x0010
+0x11	0x0011
+0x12	0x0012
+0x13	0x0013
+0x14	0x0014
+0x15	0x0015
+0x16	0x0016
+0x17	0x0017
+0x18	0x0018
+0x19	0x0019
+0x1A	0x001A
+0x1B	0x001B
+0x1C	0x001C
+0x1D	0x001D
+0x1E	0x001E
+0x1F	0x001F
+0x20	0x0020
+0x21	0x0021
+0x22	0x0022
+0x23	0x0023
+0x24	0x0024
+0x25	0x0025
+0x26	0x0026
+0x27	0x0027
+0x28	0x0028
+0x29	0x0029
+0x2A	0x002A
+0x2B	0x002B
+0x2C	0x002C
+0x2D	0x002D
+0x2E	0x002E
+0x2F	0x002F
+0x30	0x0030
+0x31	0x0031
+0x32	0x0032
+0x33	0x0033
+0x34	0x0034
+0x35	0x0035
+0x36	0x0036
+0x37	0x0037
+0x38	0x0038
+0x39	0x0039
+0x3A	0x003A
+0x3B	0x003B
+0x3C	0x003C
+0x3D	0x003D
+0x3E	0x003E
+0x3F	0x003F
+0x40	0x0040
+0x41	0x0041
+0x42	0x0042
+0x43	0x0043
+0x44	0x0044
+0x45	0x0045
+0x46	0x0046
+0x47	0x0047
+0x48	0x0048
+0x49	0x0049
+0x4A	0x004A
+0x4B	0x004B
+0x4C	0x004C
+0x4D	0x004D
+0x4E	0x004E
+0x4F	0x004F
+0x50	0x0050
+0x51	0x0051
+0x52	0x0052
+0x53	0x0053
+0x54	0x0054
+0x55	0x0055
+0x56	0x0056
+0x57	0x0057
+0x58	0x0058
+0x59	0x0059
+0x5A	0x005A
+0x5B	0x005B
+0x5C	0x005C
+0x5D	0x005D
+0x5E	0x005E
+0x5F	0x005F
+0x60	0x0060
+0x61	0x0061
+0x62	0x0062
+0x63	0x0063
+0x64	0x0064
+0x65	0x0065
+0x66	0x0066
+0x67	0x0067
+0x68	0x0068
+0x69	0x0069
+0x6A	0x006A
+0x6B	0x006B
+0x6C	0x006C
+0x6D	0x006D
+0x6E	0x006E
+0x6F	0x006F
+0x70	0x0070
+0x71	0x0071
+0x72	0x0072
+0x73	0x0073
+0x74	0x0074
+0x75	0x0075