Commits

Anonymous committed 851d9f2

major update <87adtnt6rb.fsf@tleeps18.sk.tsukuba.ac.jp>

  • Participants
  • Parent commits 9979124
  • Tags sumo-2002-03-12

Comments (0)

Files changed (8)

 
 Mule bogusly considers the various ISO-8859 extended character sets as
 disjoint, when ISO 8859 itself clearly considers them to be subsets of
-a larger character set.  This package provides functions which
-determine the list of coding systems which can encode all of the
-characters in the buffer, and translate to a common coding system.
+a larger character set.  This package provides functions to determine
+the list of coding systems which can encode all of the characters in
+the buffer, and translate to a common coding system.  It also provides
+Latin-9 charset, coding system, and input method.
+2002-03-05  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* latin-unity.el (latin-unity-guess-coding-system):
+	(latin-unity-guess-charset):
+	(latin-unity-coding-system-alias-alist): 
+	(latin-unity-charset-alias-alist): 
+	New functions and variables.
+	(latin-unity-remap-region): 
+	(latin-unity-recode-coding-region): 
+	(latin-unity-recode-region):
+	Implement completion and error recovery using them.
+
+	* (latin-unity-sanity-check):
+	(latin-unity-remap-region):
+	Simplify sanity-check, including moving error handling for remap
+	failure to remap-region.
+
+	* latin-unity.texi (Interactive Usage: latin-unity-guess-charset,
+	latin-unity-guess-coding-system,
+	latin-unity-preferred-coding-system-list,
+	latin-unity-preapproved-coding-system-list):
+	Document new functions and variables.
+	(Basic Functionality: latin-unity-ucs-list): New name.
+
+	* latin-unity.el (latin-unity-representations-feasible-region):
+	(latin-unity-representations-present-region):
+	(latin-unity-recommend-representation):
+	(latin-unity-remap-region):
+	Handle start == nil case for autosaves.
+
+	(latin-unity-sanity-check):
+	(latin-unity-recommend-representation):
+	Handle null buffer-default or preferred properly.
+
+2002-03-04  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* latin-unity.el (latin-unity-preapproved-coding-system-list):
+	(latin-unity-preferred-coding-system-list):
+	(latin-unity-ucs-list):
+	(latin-unity-iso-8859-1-aliases):
+	Extensible lists are Customize type 'repeat.
+	(latin-unity-recommend-representation): Report the buffer to save.
+
+2002-03-03  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* latin-unity.texi (Charsets and Coding Systems, Internals): New nodes.
+
+	* BLURB: Advertise provision of charset, coding system, IM.
+
+	* latin-unity.el (latin-unity-maybe-remap):
+	(latin-unity-recommend-representation):
+	New functions broken out of `latin-unity-sanity-check'.
+	(latin-unity-sanity-check): Reorganize using new functions.
+
+2002-03-02  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* latin-unity.el (latin-unity-coding-system-priority-list):
+	(latin-unity-coding-system-priority-list-buffer):
+	New variables.
+	(latin-unity-coding-system-priority-list): Help function.
+
+	* README: Document Latin 9 input.
+
+	* latin-unity.el (latin-unity-sanity-check): Handle case where
+	region can be represented with remapping as documented.  Special-
+	case 'iso-8859-1, Mule doesn't consider it type 'iso2022.
+
+	* latin-euro-input.el: New file.
+
+	* latin-unity-vars.el: Add coding cookie.
+
+2002-03-01  Stephen J. Turnbull  <stephen@xemacs.org>
+
+	* README: Update to current reality.  Add to-do stuff from Erwan
+	David and Barry Warsaw.
+
+	* latin-unity-vars.el: Add Latin-9 environment.  Convert characters
+	in comment from Latin-1 to Latin-9.
+
 2002-02-25  Stephen J. Turnbull  <stephen@xemacs.org>
 
 	* README: Note out of date status.
 
 # The XEmacs CVS version is canonical.  Keep versions n'sync.
 VERSION = 0.99
-AUTHOR_VERSION = 0.99
+AUTHOR_VERSION = 1.00
 MAINTAINER = Stephen J. Turnbull <stephen@xemacs.org>
 PACKAGE = latin-unity
 PKG_TYPE = regular
-# The Mule-UCS require will go away at some point
-REQUIRES = mule-base mule-ucs
+# The Mule-UCS, leim, and fsf-compat requires will go away at some point
+REQUIRES = mule-base mule-ucs leim fsf-compat
 CATEGORY = mule
 
-ELCS = latin-unity.elc latin-unity-vars.elc \
+ELCS = latin-unity.elc latin-unity-vars.elc latin-euro-input.elc \
        latin-unity-tables.elc latin-unity-utils.elc
 
 # for defvars and creation of ISO 8859/15 charset and coding system
 ***** latin-unity
 
-This is the beta test version of the latin-unity package for Mule
-XEmacs.
-
-This file has not been updated; the new Texinfo manual is more reliable.
-
 Mule bogusly considers the various ISO-8859 extended character sets as
 disjoint, when ISO 8859 itself clearly considers them to be subsets of
 a larger character set.  For example, all of the Latin character sets
 include NO-BREAK SPACE at code point 32 (ie, 0xA0 in an 8-bit code),
 but the Latin-1 and Latin-2 NO-BREAK SPACE characters are considered
-to be different by Mule, an obvious absurdity.  This package provides
-functions which determine the list of coding systems which can encode
-all of the characters in the buffer, and translate to a common coding
-system if possible.
+to be different by Mule, an obvious absurdity.
+
+This package provides functions which determine the list of coding
+systems which can encode all of the characters in the buffer, and
+translate to a common coding system if possible.
+
+
+***** Basic usage:
+
+To set up the package, simply put
+
+(latin-unity-install)
+
+in your init file.
+
+
+***** Availability:
+
+anonymous CVS:
+Get the latin-unity module and build as usual.
+
+WWW:
+ftp://ftp.xemacs.org/pub/xemacs/packages/latin-unity-VERSION-pkg.tar.gz
+
 
 ***** Features:
 
-  o ISO 8859/15 for XEmacs 21.4 (lightly tested) and 21.1 (untested).
-    To get 'iso-8859-15 preferred to 'iso-8859-1 in autodetection, use
-    (set-coding-category-system 'iso-8-1 'iso-8859-15).  (untested)
-
-    If all you want is ISO 8859/15 support, you can either copy the
-    ISO 8859/15 setup to another file, or `(require 'latin-unity-vars).
-
   o If a buffer contains only ASCII and ISO-8859 Latin characters, the
     buffer can be "unified", that is treated so that all characters are
     translated to one charset that includes them all.  If the current
     buffer coding system is not sufficient, the package will suggest
     alternatives.  It prefers ISO-8859 encodings, but also suggests
-    UTF-8 (if available; 21.4+ feature), ISO 2022 7-bit, or X Compound
-    Text if no ISO 8859 coding system is comprehensive enough.
+    UTF-8 (if available; 21.4+ feature, currently requires Mule-UCS),
+    ISO 2022 7-bit, or X Compound Text if no ISO 8859 coding system is
+    comprehensive enough.
 
     It allows the user to use other coding systems, and the list of
     suggested coding systems is Customizable.
 
     This probably also is useful out of the box if the buffer contains
-    non-Latin characters in addition to a mixture of Latin
-    characters.  For example, I believe it would reduce a buffer
-    originally ISO-2022-JP (including Latin-1 characters) to ISO
-    8859/1 if all the Japanese were deleted.  (untested)
+    non-Latin characters in addition to a mixture of Latin characters.
+    For example, it would reduce a buffer originally encoded in
+    ISO-2022-JP (including Latin-1 characters) to ISO 8859/1 if all
+    the Japanese were deleted.  (untested)
+
+  o ISO 8859/15 for XEmacs 21.4 (lightly tested) and 21.1 (untested).
+    To get 'iso-8859-15 preferred to 'iso-8859-1 in autodetection, use
+    (set-coding-category-system 'iso-8-1 'iso-8859-15).  (untested)
+    Alternatively set language environment to Latin-9.
+
+    If all you want is ISO 8859/15 support, you can either copy the
+    ISO 8859/15 setup to another file, or `(require 'latin-unity-vars)'
+    and `(require 'latin-euro-input)'.
 
   o Hooks into `write-region' to prevent (or at least drastically
     reduce the probability of) introduction of ISO 2022 escape
     This may permit us to turn off support for those sequences
     entirely in our ISO 8859 coding-systems.
 
-  o Depends only on mule-base in operation.  Table generation depends
-    on Unicode support such as Mule-UCS or Ben's ben-mule-21-5
-    workspace, and the package build currently requires Mule-UCS.
+  o Interactive functions to _remap_ a region between character sets
+    (preserving character identity) and _recode_ a region (preserving
+    the code point).  The former is probably not useful if the
+    automatic function is working at all, but provided for
+    completeness.  The latter is useful if Mule mistakenly reads an
+    ISO 8859/2 file as ISO 8859/1; you can change it without rereading
+    the file.  Since it's region-oriented, you can also deal with cut
+    and paste from dumb applications that export everything as ISO 8859/1.
+
+  o A nearly comprehensive Texinfo manual contains a discussion of
+    why these things happen, why they can't be 100% avoided in an 8-bit
+    world, and some defensive measures users can take, as well as the
+    usual install, configure, and operating instructions.
+
+  o latin-unity itself depends only on mule-base in operation.  Table
+    generation depends on Unicode support such as Mule-UCS or Ben's
+    ben-mule-21-5 workspace, and the package build currently requires
+    Mule-UCS.  The input method depends on LEIM and fsf-compat.
 
 Current misfeatures:
 
+  o Need `(require 'latin-euro-input)' to get Quail support.
+
   o If the buffer is changed by the hook, apparently write-region
     starts over again from the top.  The buffer is checked again, and
     you are asked to choose the coding system again.  If you choose
     time.  You'll get the same default as the first time.
 
   o Probable performance hit on large (> 20kB) buffers with many
-    (>20%) non-ASCII characters.  Possible otimizations are given near
+    (>20%) non-ASCII characters.  Possible optimizations are given near
     `latin-unity-region-feasible-representations' in latin-unity.el.
 
-  o Custom-loads aren't built for the package.  You'll need to `(require
-    'latin-unity)' to get Customize's information loaded.
+  o Package depends on Mule-UCS, LEIM (Quail), and fsf-compat.
 
-  o Package depends on Mule-UCS.
+  o This README is too long.
 
-Planned:
+Planned, mostly near future:
 
   o Fix the misfeatures.
 
-  o GNU Emacs support.
+  o Check -*- coding: codesys -*- cookies for consistency.
 
   o Fix JIS Roman (as an alternative to ASCII) support.
 
-  o More UI features (like list of unrepresentable charsets, and
-    perhaps highlighting them in buffer)
+  o Support Latin-10 (ISO 8859/16) aka Latin-2 + EURO SIGN.
+
+  o More UI features (like highlighting unrepresentable chars in buffer).
 
   o Integration to development tree (but probably not 21.4, this
     package should be good enough).
 
-  o Eliminate all need for Mule-UCS.
+  o Charset completion for the interactive recoding/remapping functions.
 
-Not planned:
+  o Hook into MUAs.
 
-  o Extension to Han-unity.  This needs to be treated more carefully.
+  o GNU Emacs support.
 
-***** Availability:
+Not planned any time soon:
 
-These URLs will change upon public release.
+  o Extend to process buffers in some way, which looks very hard.
 
-    anonymous CVS:
-    Get the XEmacs/packages/mule-packages/latin-unity module.  You'll
-    need to fix up the lists of packages in the package-compile.el
-    utility (and possible in mule-package/Makefile) to build a
-    package, but for general use, just byte-compiling latin-unity and
-    latin-unity-tables, and putting them on your path, should be fine.
-
-    WWW:
-    http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/latin-unity-0.90-pkg.tar.gz
-
-
-***** Basic usage:
-
-To set up the package, simply put
-
-(add-hook 'write-region-pre-hook #'latin-unity-sanity-check)
-
-in your init file.
+  o Han-unity.  This is not entirely analogous to Latin unity, and
+    needs to be treated very carefully.
 
 
 ***** Implementation:
 latin-unity-vars.el contains the definition of ISO 8859/15 and variables
 common to several modules.
 
+latin-euro-input.el contains Dave Love's Quail input method for Latin 9.
+
 latin-unity-tables.el contains the table of feasible character sets and
 equivalent Mule characters from other character sets for the various Mule
 representations of each character.  Automatically generated.

File latin-euro-input.el

+;;; latin-9-input.el --- Input method for Latin-9 (ISO 8859/15) -*- coding: iso-2022-jp -*-
+
+;; Copyright (C) 2001, 2002 Free Software Foundation, Inc
+
+;; Author: Dave Love
+;; Adapted-by: Stephen J. Turnbull for XEmacs
+;; Keywords: mule, input methods
+;; Created: 2002 March 1
+;; Last-modified: 2002 March 1
+
+;; This file is part of XEmacs.
+
+;; XEmacs is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+
+;; XEmacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with XEmacs; see the file COPYING.  If not, write to the
+;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+;; Boston, MA 02111-1307, USA.
+
+
+;;; Commentary:
+
+;; Grabbed from latin-pre.el in the Emacs 21 distribution.  I believe this
+;; is the method posted by Dave Love to gnu.emacs.sources.  The copyright in
+;; latin-pre.el is bogus as Love's post was late 2001.
+
+(require 'latin-unity-vars)		; for ISO 8859/15
+(require 'quail)
+(quail-define-package
+ "latin-9-prefix" "Latin-9" "0>" t
+ "Latin-9 characters input method with prefix modifiers
+
+    effect   | prefix | examples
+ ------------+--------+----------
+    acute    |   '    | 'a -> ,ba(B
+    grave    |   `    | `a -> ,b`(B
+  circumflex |   ^    | ^a -> ,bb(B
+  diaeresis  |   \"    | \"a -> ,bd(B, \"Y -> ,b>(B
+    tilde    |   ~    | ~a -> ,bc(B
+    caron    |   ~    | ~z -> ,b8(B
+   cedilla   |   ~    | ~c -> ,bg(B
+    misc     | \" ~ /  | \"s -> ,b_(B  ~d -> ,bp(B  ~t -> ,b~(B  /a -> ,be(B  /e -> ,bf(B  /o -> ,bx(B
+             | \" ~ /  | /o -> ,b=(B
+   symbol    |   ~    | ~> -> ,b;(B  ~< -> ,b+(B  ~! -> ,b!(B  ~? -> ,b?(B  ~~ -> ,b8(B
+             |   ~    | ~s -> ,b'(B  ~e -> ,b$(B  ~. -> ,b7(B  ~$ -> ,b#(B  ~u -> ,b5(B
+             |   ~    | ~- -> ,b-(B  ~= -> ,b/(B
+   symbol    |  _ /   | _o -> ,b:(B  _a -> ,b*(B  // -> ,b0(B  /\\ -> ,bW(B  _y -> ,b%(B
+             |  _ /   | _: -> ,bw(B  /c -> ,b"(B  ~p -> ,b6(B
+             |  _ /   | /= -> ,b,(B
+   symbol    |   ^    | ^r -> ,b.(B  ^c -> ,b)(B  ^1 -> ,b9(B  ^2 -> ,b2(B  ^3 -> ,b3(B  _a -> ,b*(B
+" nil t nil nil nil nil nil nil nil nil t)
+
+(quail-define-rules
+ ("'A" ?,bA(B)
+ ("'E" ?,bI(B)
+ ("'I" ?,bM(B)
+ ("'O" ?,bS(B)
+ ("'U" ?,bZ(B)
+ ("'Y" ?,b](B)
+ ("'a" ?,ba(B)
+ ("'e" ?,bi(B)
+ ("'i" ?,bm(B)
+ ("'o" ?,bs(B)
+ ("'u" ?,bz(B)
+ ("'y" ?,b}(B)
+ ("' " ?')
+ ("`A" ?,b@(B)
+ ("`E" ?,bH(B)
+ ("`I" ?,bL(B)
+ ("`O" ?,bR(B)
+ ("`U" ?,bY(B)
+ ("`a" ?,b`(B)
+ ("`e" ?,bh(B)
+ ("`i" ?,bl(B)
+ ("`o" ?,br(B)
+ ("`u" ?,by(B)
+ ("``" ?`)
+ ("` " ?`)
+ ("^A" ?,bB(B)
+ ("^E" ?,bJ(B)
+ ("^I" ?,bN(B)
+ ("^O" ?,bT(B)
+ ("^U" ?,b[(B)
+ ("^a" ?,bb(B)
+ ("^e" ?,bj(B)
+ ("^i" ?,bn(B)
+ ("^o" ?,bt(B)
+ ("^u" ?,b{(B)
+ ("^^" ?^)
+ ("^ " ?^)
+ ("\"A" ?,bD(B)
+ ("\"E" ?,bK(B)
+ ("\"I" ?,bO(B)
+ ("\"O" ?,bV(B)
+ ("\"U" ?,b\(B)
+ ("\"a" ?,bd(B)
+ ("\"e" ?,bk(B)
+ ("\"i" ?,bo(B)
+ ("\"o" ?,bv(B)
+ ("\"s" ?,b_(B)
+ ("\"u" ?,b|(B)
+ ("\"y" ?,b(B)
+ ("\" " ?\")
+ ("~A" ?,bC(B)
+ ("~C" ?,bG(B)
+ ("~D" ?,bP(B)
+ ("~N" ?,bQ(B)
+ ("~O" ?,bU(B)
+ ("~S" ?,b&(B)
+ ("~T" ?,b^(B)
+ ("~Z" ?,b4(B)
+ ("~a" ?,bc(B)
+ ("~c" ?,bg(B)
+ ("~d" ?,bp(B)
+ ("~n" ?,bq(B)
+ ("~o" ?,bu(B)
+ ("~s" ?,b((B)
+ ("~t" ?,b~(B)
+ ("~z" ?,b8(B)
+ ("~>" ?\,b;(B)
+ ("~<" ?\,b+(B)
+ ("~!" ?,b!(B)
+ ("~?" ?,b?(B)
+ ("~ " ?~)
+ ("/A" ?,bE(B)
+ ("/E" ?,bF(B)
+ ("/O" ?,bX(B)
+ ("/a" ?,be(B)
+ ("/e" ?,bf(B)
+ ("/o" ?,bx(B)
+ ("//" ?,b0(B)
+ ("/ " ?/)
+ ("_o" ?,b:(B)
+ ("_a" ?,b*(B)
+ ("_+" ?,b1(B)
+ ("_y" ?,b%(B)
+ ("_:" ?,bw(B)
+ ("/c" ?,b"(B)
+ ("/\\" ?,bW(B)
+ ("/o" ?,b=(B)		; clash with ,bx(B, but ,bf(B uses /
+ ("/O" ?,b<(B)
+ ("\"Y" ?,b>(B)
+ ("~s" ?,b'(B)
+ ("~p" ?,b6(B)
+ ;; Is this the best option for Euro entry?
+ ("~e" ?,b$(B)
+ ("~." ?,b7(B)
+ ("~$" ?,b#(B)
+ ("~u" ?,b5(B)
+ ("^r" ?,b.(B)
+ ("^c" ?,b)(B)
+ ("^1" ?,b9(B)
+ ("^2" ?,b2(B)
+ ("^3" ?,b3(B)
+ ("~-" ?,b-(B)
+ ("~=" ?,b/(B)
+ ("/=" ?,b,(B))
+

File latin-unity-vars.el

-;;; latin-unity-vars.el --- Common variables and objects of latin-unity
+;;; latin-unity-vars.el --- Common variables and objects of latin-unity -*- coding: iso-2022-7 -*-
 
 ;; Copyright (C) 2002 Free Software Foundation, Inc
 
 
 ;; define ISO-8859-15 for XEmacs 21.4 and earlier
 ;(eval-when (compile load eval)
-  (unless (find-coding-system 'iso-8859-15)
-    ;; Create character set
-    (make-charset
-     'latin-iso8859-15 "ISO8859-15 (Latin 9)"
-     '(short-name "Latin-9"
-       long-name "ISO8859-15 (Latin 9)"
-       registry "iso8859-15"
-       dimension 1
-       columns 1
-       chars 96
-       final ?b
-       graphic 1
-       direction l2r))
-    ;; For syntax of Latin-9 characters.
-    (require 'cl)
-    (load-library "cl-macs")		; howcum no #'provide?
-    (loop for c from 64 to 127		; from '�' to '�'
-      do (modify-syntax-entry (make-char 'latin-iso8859-15 c) "w"))
-    (mapc (lambda (c)
-	    (modify-syntax-entry (make-char 'latin-iso8859-15 c) "w"))
-	  '(#xA6 #xA8 #xB4 #xB8 #xBC #xBD #xBE))
+;;;###autoload
+(unless (find-charset 'latin-iso8859-15)
+  ;; Create character set
+  (make-charset
+   'latin-iso8859-15 "ISO8859-15 (Latin 9)"
+   '(short-name "Latin-9"
+     long-name "ISO8859-15 (Latin 9)"
+     registry "iso8859-15"
+     dimension 1
+     columns 1
+     chars 96
+     final ?b
+     graphic 1
+     direction l2r))
+  ;; For syntax of Latin-9 characters.
+  (require 'cl)
+  (load-library "cl-macs")		; howcum no #'provide?
+  (loop for c from 64 to 127		; from ',b@(B' to ',b(B'
+    do (modify-syntax-entry (make-char 'latin-iso8859-15 c) "w"))
+  (mapc (lambda (c)
+	  (modify-syntax-entry (make-char 'latin-iso8859-15 c) "w"))
+	'(#xA6 #xA8 #xB4 #xB8 #xBC #xBD #xBE))
+  
+  (modify-syntax-entry (make-char 'latin-iso8859-15 32) "w") ; no-break space
+  (modify-syntax-entry (make-char 'latin-iso8859-15 87) "_") ; multiply
+  (modify-syntax-entry (make-char 'latin-iso8859-15 119) "_") ; divide
+  )
 
-    (modify-syntax-entry (make-char 'latin-iso8859-15 32) "w") ; no-break space
-    (modify-syntax-entry (make-char 'latin-iso8859-15 87) "_") ; multiply
-    (modify-syntax-entry (make-char 'latin-iso8859-15 119) "_") ; divide
-    ;; Create coding system
-    (make-coding-system
-     'iso-8859-15 'iso2022 "MIME ISO-8859-15"
-     '(charset-g0 ascii
-       charset-g1 latin-iso8859-15
-       charset-g2 t			; grrr
-       charset-g3 t			; grrr
-       mnemonic "MIME/Ltn-9")))
+(unless (find-coding-system 'iso-8859-15)
+  ;; Create coding system
+  (make-coding-system
+   'iso-8859-15 'iso2022 "MIME ISO-8859-15"
+   '(charset-g0 ascii
+     charset-g1 latin-iso8859-15
+     charset-g2 t			; grrr
+     charset-g3 t			; grrr
+     mnemonic "MIME/Ltn-9")))
+(defun setup-latin9-environment ()
+  "Set up multilingual environment (MULE) for European Latin-9 users."
+  (interactive)
+  (set-language-environment "Latin-9"))
+
+(set-language-info-alist
+ "Latin-9" '((charset ascii latin-iso8859-15)
+	     (coding-system iso-8859-15)
+	     (coding-priority iso-8859-15)
+	     (input-method . "latin-9-prefix")
+	     (sample-text
+	      ;; I'd like to append ", my ,b$(B0.02" to the following string,
+	      ;; but can't due to a bug in escape-quoted support
+	      ;; NB: convert the Latin-1 to Latin-9 when possible
+	      . "Hello, Hej, Tere, Hei, Bonjour, Gr,b|_(B Gott, Ciao, ,b!(BHola!, my ,b$(B0.02")
+	     (documentation . "\
+This language environment is a generic one for Latin-9 (ISO-8859-15)
+character set which supports the Euro and the following languages:
+ Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic,
+ Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish.
+We also have a German specific language environment \"German\"."))
+ '("European"))
 ;)
 
 ;; latin-unity-equivalence-table

File latin-unity.el

 ;; characters in the buffer.
 
 ;; Provides the 'iso-8859-15 coding system if yet undefined.
-;; #### Get the final byte for 'iso-8859-15 and do it too.
+;; #### Get the final byte for 'iso-8859-16 and do it too.
 
 ;;; Code:
 
   "Handle equivalent ISO-8859 characters properly (identify them) on output."
   :group 'mule)
 
-(defcustom latin-unity-approved-ucs-list '(utf-8 iso-2022-7 ctext)
-  "List of coding systems considered to be universal.
+;; #### We demand a coding system widget!
+;; #### The :set functions should do sanity and cross checks.
+(defcustom latin-unity-preapproved-coding-system-list
+  '(buffer-default preferred)
+  "*List of coding systems used without querying the user if feasible.
+
+The feasible first coding system in this list is used.  The special values
+'preferred and 'buffer-default may be present:
+
+  buffer-default  Use the coding system used by `write-region', if feasible.
+  preferred       Use the coding system specified by `prefer-coding-system'
+                  if feasible.
+
+Note that if your preferred coding system is a universal coding system, and
+@samp{preferred} is a member of this list, @pkgname{} will blithely convert
+all your files to that coding system.  This is considered a feature, but it
+may surprise most users.  Users who don't like this behavior should put
+@samp{preferred} in @samp{latin-unity-preferred-coding-system-list}.
+
+\"Feasible\" means that all characters in the buffer can be represented by
+the coding system.  Coding systems in `latin-unity-ucs-list' are always
+considered feasible.  Other feasible coding systems are computed by
+`latin-unity-representations-feasible-region'.
+
+Note that the first universal coding system in this list shadows all other
+coding systems."
+  :type '(repeat symbol)
+  :group 'latin-unity)
+
+(defcustom latin-unity-preferred-coding-system-list
+  '(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-9)
+  "*List of coding systems suggested the user if feasible.
+
+If none of the coding systems in `latin-unity-preferred-coding-system-list'
+are feasible, this list will be recommended to the user, followed by the
+`latin-unity-ucs-list' (so those coding systems should not be in this list).
+The first coding system in this list is default.
+
+The special values 'preferred and 'buffer-default may be present:
+  buffer-default  Use the coding system used by `write-region', if feasible.
+  preferred       Use the coding system specified by `prefer-coding-system'
+                  if feasible.
+
+\"Feasible\" means that all characters in the buffer can be represented by
+the coding system.  Coding systems in `latin-unity-ucs-list' are always
+considered feasible.  Other feasible coding systems are computed by
+`latin-unity-representations-feasible-region'."
+  :type '(repeat symbol)
+  :group 'latin-unity)
+
+(defcustom latin-unity-ucs-list '(utf-8 iso-2022-7 ctext escape-quoted)
+  "*List of coding systems considered to be universal.
+
+A universal coding system can represent all characters by definition.
 
 Order matters; coding systems earlier in the list will be preferred when
-recommending a coding system.
-"
-  :type '(list symbol)
+recommending a coding system.  These coding systems will not be used without
+querying the user, and follow the `latin-unity-preferred-coding-system-list'
+in the list of suggested coding systems.
+
+If none of the preferred coding systems are feasible, the first in this list
+will be the default.
+
+Note: if `escape-quoted' is not a member of this list, you will be unable to
+autosave files or byte-compile Mule Lisp files."
+  :type '(repeat symbol)
   :group 'latin-unity)
 
-;; #### Coding systems which are not Latin and not in
-;; `latin-unity-approved-ucs-list' are handled by short circuiting checks
-;; of coding system against the next two variables.  A preferable approach
-;; is to define an alist of coding systems to corresponding sets of "safe"
-;; character sets, then checking that `(charsets-in-region begin end)' is
-;; contained in the appropriate set.  If you want this _now_ do it yourself
-;; and send a patch to <stephen@xemacs.org> ;-).
-
-(defcustom latin-unity-ignored-coding-system-list nil
-  "List of coding systems such that the buffer is not checked for Latin unity.
-
-Usually this means that `buffer-file-coding-system' is a member of this list.
-#### not clear this API is right, see comment in `latin-unity.el'."
-  :type '(list symbol)
+(defcustom latin-unity-charset-alias-alist
+  '((latin-1 . latin-iso8859-1)
+    (latin-2 . latin-iso8859-2)
+    (latin-3 . latin-iso8859-3)
+    (latin-4 . latin-iso8859-4)
+    (latin-5 . latin-iso8859-9)
+    (latin-9 . latin-iso8859-15)
+    (latin-10 . latin-iso8859-16))
+  "*Alist mapping aliases to Mule charset names (symbols)."
+  :type '(repeat (cons symbol symbol))
   :group 'latin-unity)
 
-(defcustom latin-unity-approved-coding-system-list nil
-  "List of coding systems forcing a save of the buffer even if Latin unity
-is not satisfied.
-#### not clear this API is right, see comment in `latin-unity.el'."
-  :type '(list symbol)
+(defcustom latin-unity-coding-system-alias-alist nil
+  "*Alist mapping aliases to Mule coding system names (symbols)."
+  :type '(repeat (cons symbol symbol))
   :group 'latin-unity)
 
+;; Needed because 'iso-8859-1 is type 'no-conversion, NOT type 'iso2022
 (defcustom latin-unity-iso-8859-1-aliases '(iso-8859-1)
-  "List of coding systems to be treated as aliases of ISO 8859/1."
-  :type '(list symbol)
+  "List of coding systems to be treated as aliases of ISO 8859/1.
+
+This is not a user variable; to customize input of coding systems or charsets,
+`latin-unity-coding-system-alias-alist' or `latin-unity-charset-alias-alist'."
+  :type '(repeat symbol)
   :group 'latin-unity)
 
+(defcustom latin-unity-coding-system-list-buffer
+  " *latin-unity coding system preferences*"
+  "Name of buffer used to display codings systems by priority."
+  :type 'string
+  :group 'latin-unity)
+
+(defun latin-unity-list-coding-systems (display-excluded)
+  "Display the coding systems listed by priority and group.
+
+With prefix argument, also display otherwise excluded coding systems.
+
+See also `latin-unity-preapproved-coding-systems',
+`latin-unity-preferred-coding-systems', and `latin-unity-ucs-list'."
+
+  (interactive "_P")
+
+  (save-excursion
+    (pop-to-buffer (get-buffer-create
+		    latin-unity-coding-system-list-buffer))
+    (erase-buffer)
+    (let ((start (point)))
+
+      (insert "Pre-approved coding systems:\n ")
+      (mapc (lambda (codesys) (insert (format " %s" codesys)))
+	    latin-unity-preapproved-coding-system-list)
+      (fill-region start (point))
+
+      (insert "\nSuggested coding systems:\n ")
+      (setq start (point))
+      (mapc (lambda (codesys) (insert (format " %s" codesys)))
+	    latin-unity-preferred-coding-system-list)
+      (fill-region start (point))
+
+      (insert "\Universal coding systems:\n ")
+      (setq start (point))
+      (mapc (lambda (codesys) (insert (format " %s" codesys)))
+	    latin-unity-ucs-list)
+      (fill-region start (point))
+
+      (when display-excluded
+	;; Should arrange to only display included ones!
+	(insert "\nAll coding systems:\n ")
+	(setq start (point))
+	(mapc (lambda (codesys) (insert (format " %s" codesys)))
+	      (coding-system-list))
+	(fill-region start (point))))))
 
 ;;; User interface
 
   (aref (get-char-table character latin-unity-equivalences)
 	(get charset 'latin-unity-index)))
 
+;; Buffer coding system feasibility
+
 ;;;###autoload
-(defun latin-unity-buffer-representations-feasible ()
-  "Apply latin-unity-region-representations-feasible to the current buffer."
+(defun latin-unity-representations-feasible-buffer ()
+  "Apply latin-unity-representations-feasible-region to the current buffer."
   (interactive)
-  (latin-unity-region-representations-feasible (point-min)
+  (latin-unity-representations-feasible-region (point-min)
 					       (point-max)
 					       (current-buffer)))
 
-;; latin-unity-region-representations-feasible
+;; latin-unity-representations-feasible-region
 ;;
 ;; The basic algorithm is to map over the region, compute the set of
 ;; charsets that can represent each character (the "feasible charset"),
 ;; for Latin character sets there are only 29 classes.
 
 ;;;###autoload
-(defun latin-unity-region-representations-feasible (begin end &optional buf)
+(defun latin-unity-representations-feasible-region (begin end &optional buf)
   "Return character sets that can represent the text from BEGIN to END in BUF.
 
 BUF defaults to the current buffer.  Called interactively, will be
 	 (latinsets (logand (lognot asciisets) latin-unity-all-flags)))
     (save-excursion
       (set-buffer (or buf (current-buffer)))
-      (goto-char begin)
-      ;; The characters skipped here can't change asciisets.
-      ;; Note that to generalize this we would need to have a notion of
-      ;; classes of characters which do not change the representability.
-      ;; One thing we can do is to add the character itself.
-      (skip-chars-forward latin-unity-ascii-and-jis-roman)
-      (while (< (point) end)
-	(let* ((ch (char-after))
-	       (cs (car (split-char ch))))
-	  (cond ((or (eq cs 'latin-jisx0201)
-		     (eq cs 'ascii))
-		 (setq asciisets
-		       (logand asciisets (latin-unity-feasible-charsets ch)
-			       )))
-		(t
-		 (setq latinsets
-		       (logand latinsets (latin-unity-feasible-charsets ch)
-			       )))))
-	(forward-char)
-	;; The characters skipped here can't change asciisets
-	(skip-chars-forward latin-unity-ascii-and-jis-roman)))
+      (save-restriction
+	(widen)
+	(let ((begin (or begin (point-min)))
+	      (end (or end (and (null begin) (point-max)))))
+	  (goto-char begin)
+	  ;; The characters skipped here can't change asciisets.
+	  ;; Note that to generalize this we would need to have a notion of
+	  ;; classes of characters which do not change the representability.
+	  ;; One thing we can do is to add the character itself.
+	  (skip-chars-forward latin-unity-ascii-and-jis-roman)
+	  (while (< (point) end)
+	    (let* ((ch (char-after))
+		   (cs (car (split-char ch))))
+	      (cond ((or (eq cs 'latin-jisx0201)
+			 (eq cs 'ascii))
+		     (setq asciisets
+			   (logand asciisets (latin-unity-feasible-charsets ch)
+				   )))
+		    (t
+		     (setq latinsets
+			   (logand latinsets (latin-unity-feasible-charsets ch)
+				   )))))
+	    (forward-char)
+	    ;; The characters skipped here can't change asciisets
+	    (skip-chars-forward latin-unity-ascii-and-jis-roman)))))
     (cons latinsets asciisets)))
 
 
 ;; however, this is not obvious because this function is quite fast (the
 ;; region mapping is all in C), and therefore we can short-circuit the
 ;; slow Lisp function above
-(defun latin-unity-region-representations-present (begin end &optional buffer)
+(defun latin-unity-representations-present-region (begin end &optional buffer)
   "Return a cons of two bit vectors giving character sets in region.
 
 The car indicates which Latin characters sets were found, the cdr the ASCII
 
   (let ((lsets 0)
 	(asets 0))
-    (mapc (lambda (cs)
-	    (cond ((memq cs '(ascii latin-jisx0201))
-		   (setq asets (logior (get cs 'latin-unity-flag-bit) asets)))
-		  ((get cs 'latin-unity-bit-flag)
-		   (setq lsets (logior (get cs 'latin-unity-flag-bit) lsets)))))
-	  (charsets-in-region begin end buffer))
+    (mapc
+     (lambda (cs)
+       (cond ((memq cs '(ascii latin-jisx0201))
+	      (setq asets (logior (get cs 'latin-unity-flag-bit) asets)))
+	     ((get cs 'latin-unity-bit-flag)
+	      (setq lsets (logior (get cs 'latin-unity-flag-bit) lsets)))))
+     (save-excursion
+       (set-buffer (or buffer (current-buffer)))
+       (save-restriction
+	 (widen)
+	 ;; #### not quite right, should test
+	 (charsets-in-region (or begin (point-min))
+			     (or end (and (null begin) (point-max)))))))
     (cons lsets asets)))
 
 
 				 &optional coding-system)
   "Check if CODING-SYSTEM can represent all characters between BEGIN and END.
 
+If not, attempt to remap Latin characters to a single Latin-N set.
+
 For compatibility with old broken versions of `write-region', CODING-SYSTEM
 defaults to `buffer-file-coding-system'.  FILENAME, APPEND, VISIT, and
 LOCKNAME are ignored.
 the buffer, return it.  Otherwise, ask the user to choose a coding system,
 and return that.
 
-This function does _not_ do the safe thing when buffer-file-coding-system is
-nil (aka no-conversion).  It considers that \"non-Latin\", and passes it on
-to the Mule detection mechanism.
+This function does _not_ do the safe thing when `buffer-file-coding-system'
+is nil (= no-conversion).  It considers that \"non-Latin\", and passes it on
+to the Mule detection mechanism.  This could result in corruption.  So avoid
+setting `buffer-file-coding-system' to nil or 'no-conversion or 'binary.
 
 This function is intended for use as a `write-region-pre-hook'.  It does
-nothing except return CODING-SYSTEM if `write-region' handlers are inhibited."
+nothing except return nil if `write-region' handlers are inhibited."
 
-  (let ((codesys (or coding-system buffer-file-coding-system)))
+  (let ((buffer-default
+	 ;; theoretically we could look at other write-region-prehooks,
+	 ;; but they might write the buffer and we lose bad
+	 (or coding-system
+	     buffer-file-coding-system
+	     (find-file-coding-system-for-write-from-filename filename)))
+	(preferred (coding-category-system (car (coding-priority-list))))
+	;; check what representations are feasible
+	;; csets == compatible character sets as (latin . ascii)
+	(csets (latin-unity-representations-feasible-region begin end))
+	;; as an optimization we also check for what's in the buffer
+	;; psets == present in buffer character sets as (latin . ascii)
+	(psets (latin-unity-representations-present-region begin end)))
+    (when latin-unity-debug
+      ;; cheezy debug code
+      (cond ((null csets) (error "no feasible reps vectors?!?"))
+	    ((null (cdr csets)) (error "no ascii reps vector?!?"))
+	    ((null (car csets)) (error "no latin reps vector?!?"))
+	    ((null psets) (error "no reps present vectors?!?"))
+	    ((null (cdr psets)) (error "no ascii reps present vector?!?"))
+	    ((null (car psets)) (error "no latin reps present vector?!?"))
+	    ((null (get 'ascii 'latin-unity-flag-bit))
+	     (error "no flag bit for ascii?!?")))
+      (message "%s %s" csets psets)
+      (sit-for 1))
+
     (cond
+
      ;; don't do anything if we're in a `write-region' handler
-     ((eq inhibit-file-name-operation 'write-region) codesys)
-     ((null codesys) nil)
-     ((memq codesys latin-unity-ignored-coding-system-list) nil)
-     ((or (and (eq (coding-system-type codesys) 'iso2022)
-	       (coding-system-property codesys 'charset-g1))
-	  (memq codesys latin-unity-iso-8859-1-aliases))
-      ;; c[al]?sets == compatible character sets
-      ;; p[al]?sets == present in buffer character sets
-      ;; a == ascii, l == latin
-      (let* ((csets (latin-unity-region-representations-feasible begin end))
-	     (casets (cdr csets))
-	     (clsets (car csets))
-	     ;; we also need to check for what's in the buffer
-	     ;; #### it will save a lot of time in typical case if we
-	     ;; do this check first and return immediately if feasible
-	     (psets (latin-unity-region-representations-present begin end))
-	     (pasets (cdr psets))
-	     (plsets (car psets))
-	     (bfcsgr (or (car (rassq codesys latin-unity-cset-codesys-alist))
-			 (coding-system-property codesys 'charset-g1)))
-	     recommended target-cs)
-	(when latin-unity-debug 
-	  (cond ((null csets) (error "no feasible reps vectors?!?"))
-		((null casets) (error "no ascii reps vector?!?"))
-		((null clsets) (error "no latin reps vector?!?"))
-		((null psets) (error "no reps present vectors?!?"))
-		((null pasets) (error "no ascii reps present vector?!?"))
-		((null plsets) (error "no latin reps present vector?!?"))
-		((null (get 'ascii 'latin-unity-flag-bit))
-		 (error "no flag bit for ascii?!?"))
-		((null (get bfcsgr 'latin-unity-flag-bit))
-		 (error (format "no flag bit for %s?" bfcsgr))))
-	  (message "%s" csets)
-	  (sit-for 1))
-	;; we represent everything in the buffer without remapping
-	(if (and (= (logxor (get 'ascii 'latin-unity-flag-bit) pasets) 0)
-		 (= (logxor (get bfcsgr 'latin-unity-flag-bit) plsets) 0))
-	    codesys
-	  ;; #### break out this help code into a separate function.
-	  ;; don't forget to leave the computation of the recommend cs!
-	  ;; #### this let is bletch, figure out how to handle the help
-	  ;; buffer elegantly
-	  (let ((obuffer (current-buffer)))
-	    (pop-to-buffer (get-buffer-create latin-unity-help-buffer) t)
-	    ;; #### RFE: It also would be nice if the offending characters were
-	    ;; marked in the buffer being checked.
-	    (erase-buffer)
-	    (insert (format "\
-This buffer's default coding system (%s)
-cannot appropriately encode some of the characters present in the buffer."
-			    codesys))
-	    (when latin-unity-debug
-	      (insert "  Character sets found are:\n\n   ")
-	      (mapc (lambda (cs) (insert (format " %s" cs)))
-		    ;; #### Blarg, we've already done this
-		    (charsets-in-region begin end obuffer)))
-	    (insert "
+     ;; is this the right return value?
+     ((eq inhibit-file-name-operation 'write-region) nil)
+
+     ;; try the preapproved systems
+     ((catch 'done
+	(let ((systems latin-unity-preapproved-coding-system-list)
+	      (sys (car latin-unity-preapproved-coding-system-list)))
+	  ;; while always returns nil
+	  (while systems
+	    ;; #### to get rid of this we probably need to preprocess
+	    ;; latin-unity-preapproved-coding-system-list
+	    (setq sys (cond ((and (eq sys 'buffer-default) buffer-default))
+			    ((and (eq sys 'preferred) preferred))
+			    (t sys)))
+	    (when (latin-unity-maybe-remap begin end sys csets psets)
+	      (throw 'done sys))
+	    (setq systems (cdr systems))
+	    (setq sys (car systems))))))
+
+     ;; ask the user about the preferred systems
+     ;; #### RFE: It also would be nice if the offending characters
+     ;; were marked in the buffer being checked.
+     (t (let* ((recommended
+		(latin-unity-recommend-representation begin end csets))
+	       (codesys (car recommended))
+	       (charset (cdr recommended)))
+	  (when latin-unity-debug (message "%s" recommended))
+	  ;; compute return
+	  (cond
+
+	   ;; universal coding systems
+	   ;; #### we might want to unify here if the codesys is ISO 2022
+	   ;; but we don't have enough information to decide
+	   ((memq codesys latin-unity-ucs-list) codesys)
+
+	   ;; ISO 2022 (including ISO 8859) compatible systems
+	   ;; #### maybe we should check for G2 and G3 sets
+	   ;; note the special case is necessary, as 'iso-8859-1 is NOT
+	   ;; type 'iso2022, it's type 'no-conversion
+	   ((or (memq codesys latin-unity-iso-8859-1-aliases)
+		(eq (coding-system-type codesys) 'iso2022))
+	    ;; #### make sure maybe-remap always returns a coding system
+	    (when (latin-unity-maybe-remap begin end codesys csets psets)
+	      codesys))
+
+	   ;; other coding systems -- eg Windows 125x, KOI8?
+	   ;; #### unimplemented
+
+	   ;; no luck, pass the buck back to `write-region'
+	   ;; #### we really shouldn't do this, defeats the purpose
+	   (t (unless latin-unity-like-to-live-dangerously
+		(warn (concat "Passing to default coding system,"
+			      " data corruption likely"))
+		(ding)
+		nil))
+	   )))
+     )))
+
+
+(defun latin-unity-recommend-representation (begin end feasible
+					     &optional buffer)
+  "Recommend a representation for BEGIN to END from FEASIBLE in BUFFER.
+
+Returns a cons of a coding system (which can represent all characters in
+BUFFER) and a charset (to which all non-ASCII characters in BUFFER can be
+remapped.  (The former will be nil only if `latin-unity-ucs-list' is nil.)
+
+FEASIBLE is a bitvector representing the feasible character sets.
+BUFFER defaults to the current buffer."
+
+  ;; interactive not useful because of representation of FEASIBLE
+  (unless buffer (setq buffer (current-buffer)))
+
+        ;; #### this code is repeated too often
+  (let ((buffer-default
+	 ;; theoretically we could look at other write-region-prehooks,
+	 ;; but they might write the buffer and we lose bad
+	 (or
+	  ; coding-system ; I think this is null anyway
+	  buffer-file-coding-system
+	  ;; #### this is wrong for auto-saves at least
+	  ; (find-file-coding-system-for-write-from-filename
+	  ;   (buffer-file-name))
+	  ))
+	(preferred (coding-category-system (car (coding-priority-list))))
+	recommended)
+    (save-excursion
+      (pop-to-buffer (get-buffer-create latin-unity-help-buffer) t)
+      (erase-buffer)
+      (insert (format "\
+Choose a coding system to save buffer %s.
+All preapproved coding systems (%s)
+fail to appropriately encode some of the characters present in the buffer."
+		      (buffer-name buffer)
+		      latin-unity-preapproved-coding-system-list))
+      ;; #### we could get this from PRESENT and avoid the auto-save silliness
+      (when latin-unity-debug
+	(insert "  Character sets found are:\n\n   ")
+	(mapc (lambda (cs) (insert (format " %s" cs)))
+	      (save-excursion
+		(set-buffer buffer)
+		(save-restriction
+		  (widen)
+		  (let ((begin (or begin (point-min)))
+			(end (or end (point-max))))
+		    (charsets-in-region begin end))))))
+      (insert "
 
 Please pick a coding system.  The following are recommended because they can
 encode any character in the buffer:
 
    ")
-	    (mapc
-	     (lambda (cs)
-	       (if (/= (logand (get cs 'latin-unity-flag-bit) clsets) 0)
-		   (let ((sys (cdr (assq cs latin-unity-cset-codesys-alist))))
-		     (unless recommended
-		       (setq target-cs cs recommended sys))
-		     (insert (format " %s" sys)))))
-	     latin-unity-character-sets)
-	    ;; universal coding systems
-	    (mapc (lambda (cs)
-		    (when (find-coding-system cs)
-		      (unless recommended (setq recommended cs))
-		      (insert (format " %s" cs))))
-		  latin-unity-approved-ucs-list)
-	    (insert "
+      (mapc (lambda (cs)
+	      (when latin-unity-debug (message "%s" cs))
+	      (let ((sys (cdr (assq cs latin-unity-cset-codesys-alist))))
+		(when (and (memq sys
+				 (mapcar
+				  (lambda (x)
+				    (cond ((and (eq x 'preferred) preferred))
+					  ((and (eq x 'buffer-default)
+						buffer-default))
+					  (t x)))
+				  latin-unity-preferred-coding-system-list))
+			   (find-coding-system sys)
+			   (/= (logand (get cs 'latin-unity-flag-bit)
+				       (car feasible))
+			       0))
+		  (unless recommended (setq recommended (cons sys cs)))
+		  (insert (format " %s" sys)))))
+	    latin-unity-character-sets)
+      ;; universal coding systems
+      (mapc (lambda (sys)
+	      (when (find-coding-system sys)
+		(unless recommended (setq recommended (cons sys nil)))
+		(insert (format " %s" sys))))
+	    latin-unity-ucs-list)
+      (insert "
 
 Note that if you select a coding system that can not encode some characters
 in your buffer, those characters will be changed to an arbitrary replacement
 ctext are ISO 2022 conforming coding systems for 7-bit and 8-bit environments
 respectively.  Be careful, there is a lot of software that does not understand
 them.  utf-8 (Unicode) may also be unsupported in some environments, but they
-are becoming fewer all the time.  utf-8 is recommended if usable.
+are becoming fewer all the time.  utf-8 is recommended if usable (except for
+some users of Asian ideographs who need to mix languages).
 
 In Mule, most iso-* coding systems are capable of encoding all characters.
 However, characters outside of the normal range for the coding system require
 use of ISO 2022 extension techniques and is likely to be unsupported by other
 software, including software that supports iso-2022-7 or ctext.
 
-For a list of coding systems, abort now and invoke `list-coding-systems'.")
-	    (goto-char (point-min))
+For a list of coding systems, quit and invoke `list-coding-systems'.")
+      (goto-char (point-min))
+      ;; `read-coding-system' never returns a non-symbol
+      (let ((val (read-coding-system (format "Coding system [%s]: "
+					     (car recommended))
+				     (car recommended))))
+	(delete-window)
+	(if (eq val (car recommended))
+	    recommended
+	  (cons val
+		;; #### this code is repeated too often
+		(or (car (rassq val latin-unity-cset-codesys-alist))
+		    (and val
+			 (eq (coding-system-type val) 'iso2022)
+			 (coding-system-property val 'charset-g1)))))))))
 
-	    (let ((val (read-coding-system (format "Coding system [%s]: "
-						   recommended)
-					   recommended)))
-	      (delete-window)
-	      (set-buffer obuffer)
-	      ;; compute return
-	      (cond
-	       ;; pre-approved coding systems
-	       ((or (memq val latin-unity-approved-ucs-list)
-		    (memq val latin-unity-approved-coding-system-list))
-		val)
-	       ;; ISO 2022 (including ISO 8859) compatible systems
-	       ;; maybe we should check for G2 and G3 sets
-	       ((and (eq (coding-system-type val) 'iso2022)
-		     (setq target-cs
-			   (or (coding-system-property val 'charset-g1)
-			       target-cs))
-		     (if (latin-unity-remap-region begin end target-cs val)
-			 val
-		       (error
-			(format (concat "Couldn't remap characters to"
-					" charset %s for coding system %s"
-					target-cs val))))))
-	       ;; other coding systems -- eg Windows 125x, KOI8?
-	       ;; #### unimplemented
-	       (t nil)))))))
-      (t nil))))
+;; this could be a flet in latin-unity-sanity-check
+(defun latin-unity-maybe-remap (begin end codesys feasible &optional present)
+  "Try to remap from BEGIN to END to CODESYS.  Return nil on failure.
+
+Return CODESYS on success.  CODESYS is a coding system or nil.
+FEASIBLE is a cons of bitvectors indicating the set of character sets which
+can represent all non-ASCII characters and ASCII characters, respectively,
+in the current buffer.
+PRESENT is a cons of bitvectors indicating the set of non-ASCII and ASCII
+character sets, respectively, present in the current buffer."
+
+  ;; may God bless and keep the Mule ... far away from us!
+  (when (memq codesys latin-unity-iso-8859-1-aliases)
+    (setq codesys 'iso-8859-1))
+
+  (when latin-unity-debug
+    (message (format "%s" (list codesys feasible present))))
+
+  (let ((gr (or (car (rassq codesys latin-unity-cset-codesys-alist))
+		(and codesys
+		     (eq (coding-system-type codesys) 'iso2022)
+		     (coding-system-property codesys 'charset-g1)))))
+    (when latin-unity-debug (message (format "%s" (list codesys gr))))
+    (cond
+     ((null codesys) nil)
+     ((memq codesys latin-unity-ucs-list)
+      codesys)
+     ;; this is just an optimization, as the next arm should catch it
+     ;; note we can assume ASCII here, as if GL is JIS X 0201 Roman,
+     ;; GR will be JIS X 0201 Katakana
+     ((and (= (logxor (get 'ascii 'latin-unity-flag-bit) (cdr present)) 0)
+	   (= (logxor (get gr 'latin-unity-flag-bit 0) (car present)) 0))
+      codesys)
+     ;; we represent everything in the buffer with remapping
+     ((and (logand (get 'ascii 'latin-unity-flag-bit) (cdr feasible))
+	   (logand (get gr 'latin-unity-flag-bit 0) (car feasible)))
+      (progn (when latin-unity-debug (message "trying remap")) t)
+      (latin-unity-remap-region begin end gr codesys))
+     (t nil))))
 
 
 ;;;###autoload
 To change from one Mule representation to another without changing identity
 of any characters, use `latin-unity-remap-region'."
 
-  ;; #### Implement constraint and completion here
-  (interactive "*r\nSCurrent character set: \nSDesired character set: ")
+  (interactive
+   (let ((begin (region-beginning))
+	 (end (region-end)))
+     (list begin end
+	   ;; #### Abstract this to handle both charset and coding system
+	   (let ((cs (intern (completing-read "Current character set: "
+					      obarray #'find-charset))))
+	     (while (not (find-charset cs))
+	       (setq cs (latin-unity-guess-charset cs))
+	       (cond ((not (find-charset cs))
+		      (setq cs (intern (completing-read
+					"Oops.  Current character set: "
+					obarray #'find-charset))))
+		     ((y-or-n-p (format "Guessing %s " cs)) cs)
+		     (t (setq cs nil))))
+	     cs)
+	   (let ((cs (intern (completing-read "Desired character set: "
+					      obarray #'find-charset))))
+	     (while (not (find-charset cs))
+	       (setq cs (latin-unity-guess-charset cs))
+	       (cond ((not (find-charset cs))
+		      (setq cs (intern (completing-read
+					"Oops.  Desired character set: "
+					obarray #'find-charset))))
+		     ((y-or-n-p (format "Guessing %s " cs)) cs)
+		     (t (setq cs nil))))
+	     cs))))
 
   (save-excursion
     (goto-char begin)
 To change from one Mule representation to another without changing identity
 of any characters, use `latin-unity-remap-region'."
 
-  (interactive "*r\nzCurrent coding system: \nzDesired coding system: ")
+  (interactive
+   (let ((begin (region-beginning))
+	 (end (region-end)))
+     (list begin end
+	   ;; #### Abstract this to handle both charset and coding system
+	   (let ((cs (intern (completing-read "Current coding system: "
+					      obarray #'find-coding-system))))
+	     (while (not (find-coding-system cs))
+	       (setq cs (latin-unity-guess-coding-system cs))
+	       (cond ((not (find-coding-system cs))
+		      (setq cs (intern (completing-read
+					"Oops.  Current coding system: "
+					obarray #'find-coding-system))))
+		     ((y-or-n-p (format "Guessing %s " cs)) cs)
+		     (t (setq cs nil))))
+	     cs)
+	   (let ((cs (intern (completing-read "Desired coding system: "
+					      obarray #'find-coding-system))))
+	     (while (not (find-coding-system cs))
+	       (setq cs (latin-unity-guess-coding-system cs))
+	       (cond ((not (find-coding-system cs))
+		      (setq cs (intern
+				(completing-read
+				 "Oops.  Desired coding system: "
+				 obarray #'find-coding-system))))
+		     ((y-or-n-p (format "Guessing %s " cs)) cs)
+		     (t (setq cs nil))))
+	     cs))))
+
   (encode-coding-region begin end wrong-cs)
   (decode-coding-region begin end right-cs))
 
 
 ;;;###autoload
 (defun latin-unity-remap-region (begin end character-set
-				 ;; #### maybe this should be a keyword arg?
-				 &optional coding-system)
+				 &optional coding-system no-error)
   "Remap characters between BEGIN and END to equivalents in CHARACTER-SET.
 Optional argument CODING-SYSTEM may be a coding system name (a symbol) or
 nil.  Characters with no equivalent are left as-is.
 
 Note:  by default this function is quite fascist about universal coding
 systems.  It only admits utf-8, iso-2022-7, and ctext.  Customize
-`latin-unity-approved-ucs-list' to change this.
+`latin-unity-ucs-list' to change this.
 
 This function remaps characters that are artificially distinguished by Mule
 internal code.  It may change the code point as well as the character set.
 `latin-unity-recode-region'."
 
   (interactive "*r\nSCharacter set: ")
-
-  (if (not (charsetp (find-charset character-set)))
-      ;; #### Should be more user-friendly here
-      (error (format "%s is not the name of a character set." character-set)))
+  (interactive
+   (let ((begin (region-beginning))
+	 (end (region-end)))
+     (list begin end
+	   ;; #### Abstract this to handle both charset and coding system
+	   (let ((cs (intern (completing-read "Character set: "
+					      obarray #'find-charset))))
+	     (while (not (find-charset cs))
+	       (setq cs (latin-unity-guess-charset cs))
+	       (cond ((not (find-charset cs))
+		      (setq cs (intern
+				(completing-read "Oops.  Character set: "
+						 obarray #'find-charset))))
+		     ((y-or-n-p (format "Guessing %s " cs)) cs)
+		     (t (setq cs nil))))
+	     cs))))
 
   (save-excursion
     (save-restriction
-      (narrow-to-region begin end)
-      (goto-char (point-min))
-      (while (not (eobp))
-	;; #### RFE: optimize using skip-chars-forward
-	(let* ((ch (char-after))
-	       (repch (latin-unity-equivalent-character ch character-set)))
-	  (if (or (not repch)
-		  (= repch ch))
-	      (forward-char 1)
-	    (insert repch)
-	    (delete-char 1))))))
+      ;; #### we're not even gonna try if we're in an auto-save
+      (when begin
+	(narrow-to-region begin end)
+	(goto-char (point-min))
+	(while (not (eobp))
+	  ;; #### RFE: optimize using skip-chars-forward
+	  (let* ((ch (char-after))
+		 (repch (latin-unity-equivalent-character ch character-set)))
+	    (if (or (not repch)
+		    (= repch ch))
+		(forward-char 1)
+	      (insert repch)
+	      (delete-char 1))))
 
-  (cond ((memq coding-system latin-unity-approved-ucs-list) coding-system)
-	((null (delq character-set
-		     (delq 'ascii (charsets-in-region begin end))))
-	 (or coding-system t))
-	(t nil)))
+	(let ((remaining (delq character-set
+			       (delq 'ascii
+				     (charsets-in-region begin end)))))
+	  (when (or remaining latin-unity-debug)
+	    (message (format "Could not remap characters from %s to %s"
+			     remaining character-set)))
+	  (cond ((memq coding-system latin-unity-ucs-list) coding-system)
+		((null remaining)
+		 (or coding-system
+		     (cdr (assq codesys latin-unity-cset-codesys-alist))
+		     ;; #### Is this the right thing to do here?
+		     t))
+		(t (unless no-error (error 'args-out-of-range
+					   "Remap failed; can't save!")))))
+	))))
+
+(defun latin-unity-guess-charset (candidate)
+  "Guess a charset based on the symbol CANDIDATE.
+
+CANDIDATE itself is not tried as the value.
+
+Uses the natural mapping in `latin-unity-cset-codesys-alist', and the values
+in `latin-unity-charset-alias-alist'."
+  (let ((charset
+	 (cond ((not (symbolp candidate))
+		(error 'wrong-type-argument "Not a symbol: " candidate))
+	       ((find-coding-system candidate)
+		(car (rassq candidate latin-unity-cset-codesys-alist)))
+	       (t (cdr (assq  candidate latin-unity-charset-alias-alist))))))
+    (when (find-charset charset)
+      charset)))
+
+(defun latin-unity-guess-coding-system (candidate)
+  "Guess a coding system based on the symbol CANDIDATE.
+
+CANDIDATE itself is not tried as the value.
+
+Uses the natural mapping in `latin-unity-cset-codesys-alist', and the values
+in `latin-unity-coding-system-alias-alist'."
+
+  (let ((coding-system
+	 (cond ((not (symbolp candidate))
+		(error 'wrong-type-argument "Not a symbol: " candidate))
+	       ((find-charset candidate)
+		(car (assq candidate latin-unity-cset-codesys-alist)))
+	       (t (cdr (assq candidate
+			     latin-unity-coding-system-alias-alist))))))
+    (when (find-coding-system coding-system)
+      coding-system)))
+
 
 ;;;###autoload  
 (defun latin-unity-test ()
   (insert ?i)
   (insert (make-char 'latin-iso8859-2 102)) ; c acute, not in Latin-1
   (insert "\n... to here is representable in Latin-2 but not Latin-1.\n")
-  (insert (make-char 'latin-iso8859-1 255))
+  (insert (make-char 'latin-iso8859-1 255)) ; y daieresis, not in Latin-2
   (insert "\nFrom top to here is not representable in Latin-[12].\n")
 
   (insert "

File latin-unity.texi

 * Theory of Operation::         How @pkgname{} works.
 * What latin-unity Cannot Do for You::  Inherent problems of 8-bit charsets.
 
-@c For programmers:
+For programmers:
 @c * Interfaces::                  Calling @pkgname{} from Lisp code.
+* Charsets and Coding Systems:: Reference lists with annotations.
 
-@c For maintainers:
-@c * Internals::                   Implementation details.
+For maintainers:
+* Internals::                   Utilities and implementation details.
 
 @c ** For small packages, with no or few subnodes, a detailmenu is not
 @c ** necessary.
 character.  @xref{Theory of Operation}.
 
 There are a few variables which determine which coding systems are
-always acceptable to @pkgname{}, @code{latin-unity-approved-ucs-list},
-@code{latin-unity-ignored-coding-system-list}, and
-@code{latin-unity-approved-coding-system-list}.  The latter two default
+always acceptable to @pkgname{}, @code{latin-unity-ucs-list},
+@code{latin-unity-preferred-coding-system-list}, and
+@code{latin-unity-preapproved-coding-system-list}.  The latter two default
 to @code{()}, and should probably be avoided because they short-circuit
 the sanity check.  If you find you need to use them, consider reporting
 it as a bug or request for enhancement.  Because they seem unsafe, the
 @end menu
 
 
+@c #### need to describe completion features?
 @node Basic Functionality, Interactive Usage, , Usage
 @section Basic Functionality
 
 @end defun
 
 
-@defopt latin-unity-approved-ucs-list
-
-The default value is @code{'(utf-8 iso-2022-7 ctext)}.
+@defopt latin-unity-ucs-list
+The default value is @code{'(utf-8 iso-2022-7 ctext escape-quoted)}.
 
 List of coding systems considered to be universal.
 
 Order matters; coding systems earlier in the list will be preferred when
 recommending a coding system.
+
+@samp{escape-quoted} is a special coding system used for autosaves and
+compiled Lisp in Mule.  You should not delete this, and it is rare that
+a user would want to use it directly.
 @end defopt
 
 
 Coding systems which are not Latin and not in
-@code{latin-unity-approved-ucs-list} are handled by short circuiting
-checks of coding system against the next two variables.  A preferable
-approach is to define an alist of coding systems to corresponding sets
-of ``safe'' character sets, then checking that @code{(charsets-in-region
-begin end)} is contained in the appropriate set.  If you want this
-@emph{now} do it yourself and send a patch to
-@email{stephen@@xemacs.org}.
+@code{latin-unity-ucs-list} are handled by short circuiting checks of
+coding system against the next two variables.
 
+@defopt latin-unity-preapproved-coding-system-list
+List of coding systems used without querying the user if feasible.
 
-@defopt latin-unity-ignored-coding-system-list
+The default value is @samp{(buffer-default preferred)}.
 
-The default value is nil.
+The feasible first coding system in this list is used.  The special values
+@samp{preferred} and @samp{buffer-default} may be present:
 
-List of coding systems such that the buffer is not checked for Latin unity.
+@table @code
+@item buffer-default
+Use the coding system used by @samp{write-region}, if feasible.
 
-Usually this means that the value of @code{buffer-file-coding-system} is
-a member of this list.
+@item preferred
+Use the coding system specified by @samp{prefer-coding-system} if feasible.
+@end table
 
-This API is likely to change.
+"Feasible" means that all characters in the buffer can be represented by
+the coding system.  Coding systems in @samp{latin-unity-ucs-list} are
+always considered feasible.  Other feasible coding systems are computed
+by @samp{latin-unity-representations-feasible-region}.
+
+Note that the first universal coding system in this list shadows all other
+coding systems.
 @end defopt
 
 
-@defopt latin-unity-approved-coding-system-list
+@defopt latin-unity-preferred-coding-system-list
+List of coding systems suggested the user if feasible.
 
-The default value is nil.
+The default value is @samp{(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3
+iso-8859-4 iso-8859-9)}.
 
-List of coding systems forcing a save of the buffer even if Latin unity
-is not satisfied.
+If none of the coding systems in
+@samp{latin-unity-preferred-coding-system-list} are feasible, this list
+will be recommended to the user, followed by the
+@samp{latin-unity-ucs-list} (so those coding systems should not be in
+this list).  The first coding system in this list is default.  The
+special values @samp{preferred} and @samp{buffer-default} may be
+present:
 
-This API is likely to change.
+@table @code
+@item buffer-default
+Use the coding system used by @samp{write-region}, if feasible.
+
+@item preferred
+Use the coding system specified by @samp{prefer-coding-system} if feasible.
+@end table
+
+"Feasible" means that all characters in the buffer can be represented by
+the coding system.  Coding systems in @samp{latin-unity-ucs-list} are
+always considered feasible.  Other feasible coding systems are computed
+by @samp{latin-unity-representations-feasible-region}.
 @end defopt
 
 
-@defopt latin-unity-iso-8859-1-aliases
+@defvar latin-unity-iso-8859-1-aliases
+List of coding systems to be treated as aliases of ISO 8859/1.
 
 The default value is '(iso-8859-1).
 
-List of coding systems to be treated as aliases of ISO 8859/1.
-@end defopt
+This is not a user variable; to customize input of coding systems or
+charsets, @samp{latin-unity-coding-system-alias-alist} or
+@samp{latin-unity-charset-alias-alist}.
+@end defvar
 
 
 @node Interactive Usage, , Basic Functionality, Usage
 @end defun
 
 
+Helper functions for input of coding system and character set names.
+
+@defun latin-unity-guess-charset candidate
+Guess a charset based on the symbol @var{candidate}.
+
+@var{candidate} itself is not tried as the value.
+
+Uses the natural mapping in @samp{latin-unity-cset-codesys-alist}, and
+the values in @samp{latin-unity-charset-alias-alist}."
+@end defun
+
+@defun latin-unity-guess-coding-system candidate
+Guess a coding system based on the symbol @var{candidate}.
+
+@var{candidate} itself is not tried as the value.
+
+Uses the natural mapping in @samp{latin-unity-cset-codesys-alist}, and
+the values in @samp{latin-unity-coding-system-alias-alist}."
+@end defun
+
+
 @defun latin-unity-test
 
 Really cheesy tests for @pkgname{}.
 earlier than 21.1, you should also load @file{auto-autoloads} using the
 full path (@emph{never} @samp{require} @file{auto-autoloads} libraries).
 
+You may wish to define aliases for commonly used character sets and
+coding systems for convenience in input.
+
+@defopt latin-unity-charset-alias-alist
+Alist mapping aliases to Mule charset names (symbols)."
+
+The default value is
+@example
+   ((latin-1 . latin-iso8859-1)
+    (latin-2 . latin-iso8859-2)
+    (latin-3 . latin-iso8859-3)
+    (latin-4 . latin-iso8859-4)
+    (latin-5 . latin-iso8859-9)
+    (latin-9 . latin-iso8859-15)
+    (latin-10 . latin-iso8859-16))
+@end example
+
+If a charset does not exist on your system, it will not complete and you
+will not be able to enter it in response to prompts.  A real charset
+with the same name as an alias in this list will shadow the alias.
+@end defopt
+
+@defopt latin-unity-coding-system-alias-alist nil
+Alist mapping aliases to Mule coding system names (symbols).
+
+The default value is @samp{nil}.
+@end defopt
+
 
 @node Bug Reports, Theory of Operation, Configuration, Top
 @chapter Reporting Bugs and Problems
 Usage}.
 
 
-@node What latin-unity Cannot Do for You, , Theory of Operation, Top
+@node What latin-unity Cannot Do for You, Charsets and Coding Systems, Theory of Operation, Top
 @chapter What latin-unity Cannot Do for You
 
 @pkgname{} @strong{cannot} save you if you insist on exporting data in
 that @emph{this is German and Swedish} and stays in Latin-1, while
 @emph{that is Polish} and needs to be recoded to Latin-2.
 
+@node Charsets and Coding Systems, Internals, What latin-unity Cannot Do for You, Top
+@chapter Charsets and Coding Systems
+
+This section provides reference lists of Mule charsets and coding
+systems.  Mule charsets are typically named by character set and
+standard.
+
+@table @strong
+@item ASCII variants
+
+Identification of equivalent characters in these sets is not properly
+implemented.  @pkgname{} does not distinguish the two charsets.
+
+@samp{ascii} @samp{latin-jisx0201}
+
+@item Extended Latin
+
+Characters from the following ISO 2022 conformant charsets are
+identified with equivalents in other charsets in the group by
+@pkgname{}.
+
+@samp{latin-iso8859-1} @samp{latin-iso8859-15} @samp{latin-iso8859-2}
+@samp{latin-iso8859-3} @samp{latin-iso8859-4} @samp{latin-iso8859-9}
+
+The follow charsets are Latin variants which are not understood by
+@pkgname{}.  In addition, many of the Asian language standards provide
+ASCII, at least, and sometimes other Latin characters.  None of these
+are identified with their ISO 8859 equivalents.
+
+@samp{vietnamese-viscii-lower}
+@samp{vietnamese-viscii-upper}
+
+@item Other character sets
+
+@samp{arabic-1-column}
+@samp{arabic-2-column}
+@samp{arabic-digit}
+@samp{arabic-iso8859-6}
+@samp{chinese-big5-1}
+@samp{chinese-big5-2}
+@samp{chinese-cns11643-1}
+@samp{chinese-cns11643-2}
+@samp{chinese-cns11643-3}
+@samp{chinese-cns11643-4}
+@samp{chinese-cns11643-5}
+@samp{chinese-cns11643-6}
+@samp{chinese-cns11643-7}
+@samp{chinese-gb2312}
+@samp{chinese-isoir165}
+@samp{cyrillic-iso8859-5}
+@samp{ethiopic}
+@samp{greek-iso8859-7}
+@samp{hebrew-iso8859-8}
+@samp{ipa}
+@samp{japanese-jisx0208}
+@samp{japanese-jisx0208-1978}
+@samp{japanese-jisx0212}
+@samp{katakana-jisx0201}
+@samp{korean-ksc5601}
+@samp{sisheng}
+@samp{thai-tis620}
+@samp{thai-xtis}
+
+@item Non-graphic charsets
+
+@samp{control-1}
+@end table
+
+@table @strong
+@item No conversion
+
+Some of these coding systems may specify EOL conventions.  Note that
+@samp{iso-8859-1} is a no-conversion coding system, not an ISO 2022
+coding system.  Although @pkgname{} attempts to compensate for this, it
+is possible that the @samp{iso-8859-1} coding system will behave
+differently from other ISO 8859 coding systems.
+
+@samp{binary} @samp{no-conversion} @samp{raw-text} @samp{iso-8859-1}
+
+@item Latin coding systems
+
+These coding systems are all single-byte, 8-bit ISO 2022 coding systems,
+combining ASCII in the GL register (bytes with high-bit clear) and an
+extended Latin character set in the GR register (bytes with high-bit set).
+
+@samp{iso-8859-15} @samp{iso-8859-2} @samp{iso-8859-3} @samp{iso-8859-4}
+@samp{iso-8859-9}
+
+These coding systems are single-byte, 8-bit coding systems that do not
+conform to general standards.  They should be avoided in all potentially
+multilingual contexts, including any text distributed over the Internet
+and World Wide Web.
+
+@samp{windows-1251}
+
+@item Multilingual coding systems
+
+The following ISO-2022-based coding systems are useful for multilingual
+text.
+
+@samp{ctext} @samp{iso-2022-lock} @samp{iso-2022-7} @samp{iso-2022-7bit}
+@samp{iso-2022-7bit-ss2} @samp{iso-2022-8} @samp{iso-2022-8bit-ss2}
+
+XEmacs also supports Unicode with the Mule-UCS package.  These are the
+preferred coding systems for multilingual use.  (There is a possible
+exception for texts that mix several Asian ideographic character sets.)
+
+@samp{utf-16-be} @samp{utf-16-be-no-signature} @samp{utf-16-le}
+@samp{utf-16-le-no-signature} @samp{utf-7} @samp{utf-7-safe}
+@samp{utf-8} @samp{utf-8-ws}
+
+@item Asian ideographic languages
+
+The following coding systems are based on ISO 2022, and are more or less
+suitable for encoding multilingual texts.  They all can represent ASCII
+at least, and sometimes several other foreign character sets, without
+resort to arbitrary ISO 2022 designations.  However, these subsets are
+not identified with the corresponding national standards in XEmacs Mule.
+
+@samp{chinese-euc} @samp{cn-big5} @samp{cn-gb-2312} @samp{gb2312}
+@samp{hz} @samp{hz-gb-2312} @samp{old-jis} @samp{japanese-euc}
+@samp{junet} @samp{euc-japan} @samp{euc-jp} @samp{iso-2022-jp}
+@samp{iso-2022-jp-1978-irv} @samp{iso-2022-jp-2} @samp{euc-kr}
+@samp{korean-euc} @samp{iso-2022-kr} @samp{iso-2022-int-1}
+
+The following coding systems cannot be used for general multilingual
+text and do not cooperate well with other coding systems.
+
+@samp{big5} @samp{shift_jis}
+
+@item Other languages
+
+The following coding systems are based on ISO 2022.  Though none of them
+provides any Latin characters beyond ASCII, XEmacs Mule allows (and up
+to 21.4 defaults to) use of ISO 2022 control sequences to designate
+other character sets for inclusion the the text.
+
+@samp{iso-8859-5} @samp{iso-8859-7} @samp{iso-8859-8}
+@samp{ctext-hebrew}
+
+The following are character sets that do not conform to ISO 2022 and
+thus cannot be safely used in a multilingual context.
+
+@samp{alternativnyj} @samp{koi8-r} @samp{tis-620} @samp{viqr}
+@samp{viscii} @samp{vscii}
+
+@item Special coding systems
+
+Mule uses the following coding systems for special purposes.
+
+@samp{automatic-conversion} @samp{undecided} @samp{escape-quoted}
+
+The following coding systems are aliases for others, and are used for
+communication with the host operating system.
+
+@samp{file-name} @samp{keyboard} @samp{terminal}
+
+@end table
+
+Mule detection of coding systems is actually limited to detection of
+classes of coding systems called @dfn{coding categories}.  These coding
+categories are identified by the ISO 2022 control sequences they use, if
+any, by their conformance to ISO 2022 restrictions on code points that
+may be used, and by characteristic patterns of use of 8-bit code points.
+
+@samp{no-conversion}
+@samp{utf-8}
+@samp{ucs-4}
+@samp{iso-7}
+@samp{iso-lock-shift}
+@samp{iso-8-1}
+@samp{iso-8-2}
+@samp{iso-8-designate}
+@samp{shift-jis}
+@samp{big5}
+
+
+@node Internals, , Charsets and Coding Systems, Top
+@chapter Internals
+
+No internals documentation yet.
+
+@file{latin-unity-utils.el} provides one utility function.
+
+@defun latin-unity-dump-tables
+
+Dump the temporary table created by loading @file{latin-unity-utils.el}
+to @file{latin-unity-tables.el}.  Loading the latter file initializes
+@samp{latin-unity-equivalences}.
+@end defun
+
 @c end of latin-unity.texi