Anonymous avatar Anonymous committed 943eaba

[xemacs-hg @ 2002-03-13 08:51:24 by ben]
The big ben-mule-21-5 check-in!
Various files were added and deleted. See CHANGES-ben-mule.
There are still some test suite failures. No crashes, though.
Many of the failures have to do with problems in the test suite itself
rather than in the actual code. I'll be addressing these in the next
day or so -- none of the test suite failures are at all critical.
Meanwhile I'll be trying to address the biggest issues -- i.e. build
or run failures, which will almost certainly happen on various platforms.
All comments should be sent to ben@xemacs.org -- use a Cc: if necessary
when sending to mailing lists. There will be pre- and post- tags,
something like

pre-ben-mule-21-5-merge-in, and
post-ben-mule-21-5-merge-in.

Comments (0)

Files changed (421)

+List of changes in new Mule workspace:
+--------------------------------------
+
+Deleted files:
+
+src/iso-wide.h
+src/mule-charset.h
+src/mule.c
+src/ntheap.h
+src/syscommctrl.h
+lisp/files-nomule.el
+lisp/help-nomule.el
+lisp/mule/mule-help.el
+lisp/mule/mule-init.el
+lisp/mule/mule-misc.el
+nt/config.h
+
+
+Other deleted files, all zero-width and accidentally present:
+
+src/events-mod.h
+tests/Dnd/README.OffiX
+tests/Dnd/dragtest.el
+netinstall/README.xemacs
+lib-src/srcdir-symlink.stamp
+
+New files:
+
+CHANGES-ben-mule
+README.ben-mule-21-5
+README.ben-separate-stderr
+TODO.ben-mule-21-5
+etc/TUTORIAL.{cs,es,nl,sk,sl}
+etc/unicode/*
+lib-src/make-mswin-unicode.pl
+lisp/code-init.el
+lisp/resize-minibuffer.el
+lisp/unicode.el
+lisp/mule/china-util.el
+lisp/mule/cyril-util.el
+lisp/mule/devan-util.el
+lisp/mule/devanagari.el
+lisp/mule/ethio-util.el
+lisp/mule/indian.el
+lisp/mule/japan-util.el
+lisp/mule/korea-util.el
+lisp/mule/lao-util.el
+lisp/mule/lao.el
+lisp/mule/mule-locale.txt
+lisp/mule/mule-msw-init.el
+lisp/mule/thai-util.el
+lisp/mule/thai.el
+lisp/mule/tibet-util.el
+lisp/mule/tibetan.el
+lisp/mule/viet-util.el
+src/charset.h
+src/intl-auto-encap-win32.c
+src/intl-auto-encap-win32.h
+src/intl-encap-win32.c
+src/intl-win32.c
+src/intl-x.c
+src/mule-coding.c
+src/text.c
+src/text.h
+src/unicode.c
+src/s/win32-common.h
+src/s/win32-native.h
+
+
+
+gzip support:
+
+-- new coding system `gzip' (bytes -> bytes); unfortunately, not quite
+   working yet because it handles only the raw zlib format and not the
+   higher-level gzip format (the zlib library is brain-damaged in that it
+   provides low-level, stream-oriented API's only for raw zlib, and for
+   gzip you have only high-level API's, which aren't useful for xemacs).
+-- configure support (with-zlib).
+
+configure changes:
+
+- file-coding always compiled in.  eol detection is off by default on unix,
+  non-mule, but can be enabled with configure option
+  --with-default-eol-detection or command-line flag -eol.
+- code that selects which files are compiled is mostly moved to
+   Makefile.in.in.  see comment in Makefile.in.in.
+- vestigial i18n3 code deleted.
+- new cygwin mswin libs imm32 (input methods), mpr (user name enumeration).
+- check for link, symlink.
+- vfork-related code deleted.
+- fix configure.usage. (delete --with-file-coding, --no-doc-file, add
+  --with-default-eol-detection, --quick-build).
+- nt/config.h has been eliminated and everything in it merged into
+  config.h.in and s/windowsnt.h.  see config.h.in for more info.
+- massive rewrite of s/windowsnt.h, m/windowsnt.h, s/cygwin32.h,
+  s/mingw32.h.  common code moved into s/win32-common.h, s/win32-native.h.
+- in nt/xemacs.mak,config.inc.samp, variable is called MULE, not HAVE_MULE,
+  for consistency with sources.
+- define TABDLY, TAB3 in freebsd.h (#### from where?)
+
+Tutorial:
+
+- massive rewrite; sync to FSF 21.0.106, switch focus to window systems,
+  new sections on terminology and multiple frames, lots of fixes for
+  current xemacs idioms.
+- german version from Adrian mostly matching my changes.
+- copy new tutorials from FSF (Spanish, Dutch, Slovak, Slovenian, Czech);
+  not updated yet though.
+- eliminate help-nomule.el and mule-help.el; merge into one single tutorial
+  function, fix lots of problems, put back in help.el where it belongs.
+  (there was some random junk in help-nomule -- string-width and make-char.
+  string-width is now in subr.el with a single definition, and make-char in
+  text.c.)
+
+Sample init file:
+
+- remove forward/backward buffer code, since it's now standard.
+- when disabling C-x C-c, make it display a message saying how to exit, not
+  just beep and complain "undefined".
+
+Key bindings: (keymap.c, keydefs.el, help.el, etc.)
+
+- M-home, M-end now move forward and backward in buffers; with Shift, stay
+  within current group (e.g. all C files; same grouping as the gutter
+  tabs). (bindings switch-to-{next/previous}-buffer[-in-group] in files.el)
+  - needed to move code from gutter-items.el to buff-menu.el that's used by
+    these bindings, since gutter-items.el is loaded only when the gutter is
+    active and these bindings (and hence the code) is not (any more) gutter
+    specific.
+- new global vars global-tty-map and global-window-system-map specify key
+  bindings for use only on TTY's or window systems, respectively.  this is
+  used to make ESC ESC be keyboard-quit on window systems, but ESC ESC ESC
+  on TTY's, where Meta + arrow keys may appear as ESC ESC O A or whatever.
+  C-z on window systems is now zap-up-to-char, and iconify-frame is moved
+  to C-Z.  ESC ESC is isearch-quit. (isearch-mode.el)
+- document global-{tty,window-system}-map in various places; display them
+  when you do C-h b.
+- fix up function documentation in general for keyboard primitives.
+  e.g. key-bindings now contains a detailed section on the steps prior to
+  looking up in keymaps, i.e. function-key-map,
+  keyboard-translate-table. etc.  define-key and other obvious starting
+  points indicate where to look for more info.
+- eliminate use and mention of grody advertised-undo and
+  deprecated-help. (simple.el, startup.el, picture.el, menubar-items.el)
+
+gnuclient, gnuserv:
+
+- clean up headers a bit.
+- use proper ms win idiom for checking for temp directory (TEMP or TMP, not
+  TMPDIR).
+
+throughout XEmacs sources:
+
+- all #ifdef FILE_CODING statements removed from code.
+
+I/O:
+
+- use PATH_MAX consistently instead of MAXPATHLEN, MAX_PATH, etc.
+- all code that does preprocessor games with C lib I/O functions (open,
+  read) has been removed.  The code has been changed to call the correct
+  function directly.  Functions that accept Intbyte * arguments for
+  filenames and such and do automatic conversion to or from external format
+  will be prefixed qxe...().  Functions that are retrying in case of EINTR
+  are prefixed retry_...().  DONT_ENCAPSULATE is long-gone.
+- never call getcwd() any more.  use our shadowed value always.
+
+Strings:
+
+- new qxe() string functions that accept Intbyte * as arguments.  These
+  work exactly like the standard strcmp(), strcpy(), sprintf(), etc. except
+  for the argument declaration differences.  We use these whenever we have
+  Intbyte * strings, which is quite often.
+- new fun build_intstring() takes an Intbyte *.  also new funs
+  build_msg_intstring (like build_intstring()) and build_msg_string (like
+  build_string()) to do a GETTEXT() before building the
+  string. (elimination of old build_translated_string(), replaced by
+  build_msg_string()).
+- the doprnt.c external entry points have been completely rewritten to be
+  more useful and have more sensible names.  We now have, for example,
+  versions that work exactly like sprintf() but return a malloc()ed string.
+- function intern_int() for Intbyte * arguments, like intern().
+- numerous places throughout code where char * replaced with something
+  else, e.g. Char_ASCII *, Intbyte *, Char_Binary *, etc.  same with
+  unsigned char *, going to UChar_Binary *, etc.
+- code in print.c that handles stdout, stderr rewritten.
+- places that print to stderr directly replaced with stderr_out().
+- new convenience functions write_fmt_string(), write_fmt_string_lisp(), stderr_out_lisp(), write_string().
+
+Allocation, Objects, Lisp Interpreter:
+
+- automatically use "managed lcrecord" code when allocating.  any lcrecord
+  can be put on a free list with free_lcrecord().
+- record_unwind_protect() returns the old spec depth.
+- unbind_to() now takes only one arg.  use unbind_to_1() if you want the
+  2-arg version, with GC protection of second arg.
+- new funs to easily inhibit GC. ({begin,end}_gc_forbidden()) use them in
+  places where gc is currently being inhibited in a more ugly fashion.
+  also, we disable GC in certain strategic places where string data is
+  often passed in, e.g. dfc functions, print functions.
+- major improvements to eistring code, fleshing out of missing funs.
+- make_buffer() -> wrap_buffer() for consistency with other objects; same
+  for make_frame() -> wrap_frame() and make_console() -> wrap_console().
+- better documentation in condition-case.
+- new convenience funs record_unwind_protect_freeing() and
+  record_unwind_protect_freeing_dynarr() for conveniently setting up an
+  unwind-protect to xfree() or Dynarr_free() a pointer.
+
+Init code:
+
+- lots of init code rewritten to be mule-correct.
+
+Processes:
+
+- always call egetenv(), never getenv(), for mule correctness.
+
+s/m files:
+
+- removal of unused DATA_END, TEXT_END, SYSTEM_PURESIZE_EXTRA, HAVE_ALLOCA
+  (automatically determined)
+- removal of vfork references (we no longer use vfork)
+
+
+make-docfile:
+
+- clean up headers a bit.
+- allow .obj to mean equivalent .c, just like for .o.
+- allow specification of a "response file" (a command-line argument
+  beginning with @, specifying a file containing further command-line
+  arguments) -- a standard mswin idiom to avoid potential command-line
+  limits and to simplify makefiles.  use this in xemacs.mak.
+
+debug support:
+
+- (cmdloop.el) new var breakpoint-on-error, which breaks into the C
+  debugger when an unhandled error occurs noninteractively.  useful when
+  debugging errors coming out of complicated make scripts, e.g. package
+  compilation, since you can set this through an env var.
+- (startup.el) new env var XEMACSDEBUG, specifying a Lisp form executed
+  early in the startup process; meant to be used for turning on debug flags
+  such as breakpoint-on-error or stack-trace-on-error, to track down
+  noninteractive errors.
+- (cmdloop.el) removed non-working code in command-error to display a
+  backtrace on debug-on-error.  use stack-trace-on-error instead to get
+  this.
+- (process.c) new var debug-process-io displays data sent to and received
+  from a process.
+- (alloc.c) staticpros have name stored with them for easier debugging.
+- (emacs.c) code that handles fatal errors consolidated and rewritten.
+  much more robust and correctly handles all fatal exits on mswin
+  (e.g. aborts, not previously handled right).
+
+command line (startup.el, emacs.c):
+
+- new option -eol to enable auto EOL detection under non-mule unix.
+- new option -nuni (--no-unicode-lib-calls) to force use of non-Unicode
+  API's under Windows NT, mostly for debugging purposes.
+- help message fixed up (divided into sections), existing problem causing
+  incomplete output fixed, undocumented options documented.
+
+startup.el:
+
+- move init routines from before-init-hook or after-init-hook; just call
+  them directly (init-menubar-at-startup, init-mule-at-startup).
+
+frame.el:
+
+- delete old commented-out code.
+
+Mule changes:
+
+Major:
+
+- the code that handles the details of processing multilingual text has
+  been consolidated to make it easier to extend it.  it has been yanked out
+  of various files (buffer.h, mule-charset.h, lisp.h, insdel.c, fns.c,
+  file-coding.c, etc.) and put into text.c and text.h.  mule-charset.h has
+  also been renamed charset.h.  all long comments concerning the
+  representations and their processing have been consolidated into text.c.
+- major rewriting of file-coding.  it's mostly abstracted into coding
+  systems that are defined by methods (similar to devices and
+  specifiers), with the ultimate aim being to allow non-i18n coding
+  systems such as gzip.  there is a "chain" coding system that allows
+  multiple coding systems to be chained together. (it doesn't yet
+  have the concept that either end of a coding system can be bytes or
+  chars; this needs to be added.)
+- large amounts of code throughout the code base have been Mule-ized,
+  not just Windows code.
+- total rewriting of OS locale code.  it notices your locale at startup and
+  sets the language environment accordingly, and calls setlocale() and sets
+  LANG when you change the language environment.  new language environment
+  properties locale, mswindows-locale, cygwin-locale, native-coding-system,
+  to determine langenv from locale and vice-versa; fix all language
+  environments (lots of language files).  langenv startup code rewritten.
+  many new functions to convert between locales, language environments,
+  etc.
+- major overhaul of the way default values for the various coding system
+  variables are handled.  all default values are collected into one
+  location, a new file code-init.el, which provides a unified mechanism for
+  setting and querying what i call "basic coding system variables" (which
+  may be aliases, parts of conses, etc.) and a mechanism of different
+  configurations (Windows w/Mule, Windows w/o Mule, Unix w/Mule, Unix w/o
+  Mule, unix w/o Mule but w/auto EOL), each of which specifies a set of
+  default values.  we determine the configuration at startup and set all
+  the values in one place. (code-init.el, code-files.el, coding.el, ...)
+- i copied the remaining language-specific files from fsf.  i made
+  some minor changes in certain cases but for the most part the stuff
+  was just copied and may not work.
+- ms windows mule support, with full unicode support.  required font,
+  redisplay, event, other changes.  ime support from ikeyama.
+
+User-Visible Changes:
+
+Lisp-Visible Changes:
+
+- ensure that `escape-quoted' works correctly even without Mule support and
+  use it for all auto-saves. (auto-save.el, fileio.c, coding.el, files.el)
+- new var buffer-file-coding-system-when-loaded specifies the actual coding
+  system used when the file was loaded (buffer-file-coding-system is
+  usually the same, but may be changed because it controls how the file is
+  written out).  use it in revert-buffer (files.el, code-files.el) and in
+  new submenu File->Revert Buffer with Specified Encoding
+  (menubar-items.el).
+- improve docs on how the coding system is determined when a file is read
+  in; improved docs are in both find-file and insert-file-contents and a
+  reference to where to find them is in
+  buffer-file-coding-system-for-read. (files.el, code-files.el)
+- new (brain-damaged) FSF way of calling post-read-conversion (only one
+  arg, not two) is supported, along with our two-argument way, as best we
+  can. (code-files.el)
+- add inexplicably missing var default-process-coding-system.  use it.  get
+  rid of former hacked-up way of setting these defaults using
+  comint-exec-hook.  also fun
+  set-buffer-process-coding-system. (code-process.el, code-cmds.el, process.c)
+- remove function set-default-coding-systems; replace with
+  set-default-output-coding-systems, which affects only the output defaults
+  (buffer-file-coding-system, output half of
+  default-process-coding-system).  the input defaults should not be set by
+  this because they should always remain `undecided' in normal
+  circumstances.  fix prefer-coding-system to use the new function and
+  correct its docs.
+- fix bug in coding-system-change-eol-conversion (code-cmds.el)
+- recognize all eol types in prefer-coding-system (code-cmds.el)
+- rewrite coding-system-category to be correct (coding.el)
+
+Internal Changes:
+
+- Separate encoding and decoding lstreams have been combined into a single
+  coding lstream.  Functions make_encoding_*_stream and
+  make_decoding_*_stream have been combined into make_coding_*_stream,
+  which takes an argument specifying whether encode or decode is wanted.
+- remove last vestiges of I18N3, I18N4 code.
+- ascii optimization for strings: we keep track of the number of ascii
+  chars at the beginning and use this to optimize byte<->char conversion on
+  strings.
+- mule-misc.el, mule-init.el deleted; code in there either deleted,
+  rewritten, or moved to another file.
+- mule.c deleted.
+- move non-Mule-specific code out of mule-cmds.el into code-cmds.el. (coding-system-change-text-conversion; remove duplicate coding-system-change-eol-conversion)
+- remove duplicate set-buffer-process-coding-system (code-cmds.el)
+- add some commented-out code from FSF mule-cmds.el
+  (find-coding-systems-region-subset-p, find-coding-systems-region,
+  find-coding-systems-string, find-coding-systems-for-charsets,
+  find-multibyte-characters, last-coding-system-specified,
+  select-safe-coding-system, select-message-coding-system) (code-cmds.el)
+- remove obsolete alias pathname-coding-system, function set-pathname-coding-system (coding.el)
+- remove coding-system property doc-string; split into `description'
+  (short, for menu items) and `documentation' (long); correct coding system
+  defns (coding.el, file-coding.c, lots of language files)
+- move coding-system-base into C and make use of internal info (coding.el, file-coding.c)
+- move undecided defn into C (coding.el, file-coding.c)
+- use define-coding-system-alias, not copy-coding-system (coding.el)
+- new coding system iso-8859-6 for arabic
+- delete windows-1251 support from cyrillic.el; we do it automatically
+- remove setup-*-environment as per FSF 21
+- rewrite european.el with lang envs for each language, so we can specify the locale
+- fix corruption in greek.el
+- sync japanese.el with FSF 20.6
+- fix warnings in mule-ccl.el
+- move FSF compat Mule fns from obsolete.el to mule-charset.el
+- eliminate unused truncate-string{-to-width}
+- make-coding-system accepts (but ignores) the additional properties
+  present in the fsf version, for compatibility.
+- i fixed the iso2022 handling so it will correctly read in files
+  containing unknown charsets, creating a "temporary" charset which
+  can later be overwritten by the real charset when it's defined.
+  this allows iso2022 elisp files with literals in strange languages
+  to compile correctly under mule.  i also added a hack that will
+  correctly read in and write out the emacs-specific "composition"
+  escape sequences, i.e. ESC 0 through ESC 4.  this means that my
+  workspace correctly compiles the new file devanagari.el that i added.
+- elimination of string-to-char-list (use string-to-list)
+- elimination of junky define-charset
+
+Search:
+
+- make regex routines reentrant, since they're sometimes called
+  reentrantly. (see regex.c for a description of how.) all global variables
+  used by the regex routines get pushed onto a stack by the callers before
+  being set, and are restored when finished.  redo the preprocessor flags
+  controlling REL_ALLOC in conjunction with this.
+
+Selection:
+
+- fix msw selection code for Mule.  proper encoding for
+  RegisterClipboardFormat.  store selection as CF_UNICODETEXT, which will
+  get converted to the other formats.  don't respond to destroy messages
+  from EmptyClipboard().
+
+Menubar:
+
+- move menu-splitting code (menu-split-long-menu, etc.) from font-menu.el
+  to menubar-items.el and redo its algorithm; use in various items with
+  long generated menus; rename to remove `font-' from beginning of
+  functions but keep old names as aliases
+- new fn menu-sort-menu
+- new items Open With Specified Encoding, Revert Buffer with Specified Encoding
+- split Mule menu into Encoding (non-Mule-specific; includes new item to
+  control EOL auto-detection) and International submenus on Options,
+  International on Help
+- redo items Grep All Files in Current Directory {and Below} using stuff
+  from sample init.el
+- Debug on Error and friends now affect current session only; not saved
+- maybe-add-init-button -> init-menubar-at-startup and call explicitly from startup.el
+- don't use charset-registry in msw-font-menu.el; it's only for X
+
+Process:
+
+- Move setenv from packages; synch setenv/getenv with 21.0.105
+
+Unicode support:
+
+- translation tables added in etc/unicode
+- new files unicode.c, unicode.el containing unicode coding systems and
+  support; old code ripped out of file-coding.c
+- translation tables read in at startup (NEEDS WORK TO MAKE IT MORE EFFICIENT)
+- support CF_TEXT, CF_UNICODETEXT in select.el
+- encapsulation code added so that we can support both Windows 9x and NT in
+  a single executable, determining at runtime whether to call the Unicode
+  or non-Unicode API.  encapsulated routines in intl-encap-win32.c
+  (non-auto-generated) and intl-auto-encap-win32.[ch] (auto-generated).
+  code generator in lib-src/make-mswin-unicode.pl.  changes throughout the
+  code to use the wide structures (W suffix) and call the encapsulated
+  Win32 API routines (qxe prefix).  calling code needs to do proper
+  conversion of text using new coding systems Qmswindows_tstr,
+  Qmswindows_unicode, or Qmswindows_multibyte. (the first points to one of
+  the other two.)
+
+
+File-coding rewrite:
+
+The coding system code has been majorly rewritten.  It's abstracted into
+coding systems that are defined by methods (similar to devices and
+specifiers).  The types of conversions have also been
+generalized. Formerly, decoding always converted bytes to characters and
+encoding the reverse (these are now called "text file converters"), but
+conversion can now happen either to or from bytes or characters.  This
+allows coding systems such as `gzip' and `base64' to be written.  When
+specifying such a coding system to an operation that expects a text file
+converter (such as reading in or writing out a file), the appropriate
+coding systems to convert between bytes and characters are automatically
+inserted into the conversion chain as necessary.  To facilitate creating
+such chains, a special coding system called "chain" has been created, which
+chains together two or more coding systems.
+
+Encoding detection has also been abstracted.  Detectors are logically
+separate from coding systems, and each detector defines one or more
+categories. (For example, the detector for Unicode defines categories such
+as UTF-8, UTF-16, UCS-4, and UTF-7.) When a particular detector is given a
+piece of text to detect, it determines likeliness values (seven of them,
+from 3 [most likely] to -3 [least likely]; specific criteria are defined
+for each possible value).  All detectors are run in parallel on a
+particular piece of text, and the results tabulated together to determine
+the actual encoding of the text.
+
+Encoding and decoding are now completely parallel operations, and the
+former "encoding" and "decoding" lstreams have been combined into a single
+"coding" lstream.  Coding system methods that were formerly split in such a
+fashion have also been combined.
+
 
 2001-09-19  Ben Wing  <ben@xemacs.org>
 
-	* configure.in (USAGE_ERROR):
-	* configure.in (CANONICALIZE_PATH):
-	* configure.in (XE_COMPUTE_RUNPATH):
-
-	The great integral types renaming.
-
-	Please see the 2001-09-19 entry in src/ChangeLog for the full details.
+	* etc\TUTORIAL.de: Translate TERMINOLOGY section for TUTORIAL.de,
+	change menu entry separator from / to ->.  Change SPC to <Space>.
+
+2001-10-07  Adrian Aichner  <adrian@xemacs.org>
+
+	* etc\TUTORIAL.de: Update of TUTORIAL.de according to Ben's
+	Updates and Syncing with Emacs 21.0.106.
 
 2001-09-17  Ben Wing  <ben@xemacs.org>
 

README.ben-mule-21-5

+oct 27, 2001:
+
+-------- proposal for better buffer-switching commands:
+
+implement what VC++ currently has.  you have a single "switch" command like
+CTRL-TAB, which as long as you hold the CTRL button down, brings successive
+buffers that are "next in line" into the current position, bumping the rest
+forward.  once you release the CTRL key, the chain is broken, and further
+CTRL-TABs will start from the beginning again.  this way, frequently used
+buffers naturally move toward the front of the chain, and you can switch
+back and forth between two buffers using CTRL-TAB.  the only thing about
+CTRL-TAB is it's a bit awkward.  the way to implement is to have
+modifier-up strokes fire off a hook, like modifier-up-hook.  this is driven
+by event dispatch, so there are no synchronization issues.  when C-tab is
+pressed, the binding function does something like set a one-shot handler on
+the modifier-up-hook (perhaps separate hooks for separate modifiers?).
+
+to do this, we'd also want to change the buffer tabs so that they maintain
+their own order.  in particular, they start out synched to the regular
+order, but as you make changes, you don't want the tabs to change
+order. (in fact, they may already do this.) selecting a particular buffer
+from the buffer tabs DOES make the buffer go to the head of the line.  the
+invariant is that if the tabs are displaying X items, those X items are the
+first X items in the standard buffer list, but may be in a different
+order. (it looks like the tabs may already implement all of this.)
+
+oct 26, 2001:
+
+necessary testing/changes:
+
+- test all eol detection stuff under windows w/ and w/o mule, unix w/ and
+  w/o mule. (test configure flag, command-line flag, menu option) may need
+  a way of pretending to be unix under cygwin.
+- test under windows w/ and w/o mule, cygwin w/ and w/o mule, cygwin x
+  windows w/ and w/o mule.
+- test undecided-dos/unix/mac.
+- check ESC ESC works as isearch-quit under TTY's.
+- test coding-system-base and all its uses (grep for them).
+- menu item to revert to most recent auto save.
+- consider renaming build_string -> build_intstring and build_c_string to
+  build_string. (consistent with build_msg_string et al; many more
+  build_c_string than build_string)
+
+oct 20, 2001:
+
+fixed problem causing crash due to invalid internal-format data, fixed an
+existing bug in valid_char_p, and added checks to more quickly catch when
+invalid chars are generated.  still need to investigate why
+mswindows-multibyte is being detected.
+
+i now see why -- we only process 65536 bytes due to a constant
+MAX_BYTES_PROCESSED_FOR_DETECTION.  instead, we should have no limit as
+long as we have a seekable stream.  we also need to write
+stderr_out_lisp(), used in the debug info routines i wrote.
+
+check once more about DEBUG_XEMACS.  i think debugging info should be
+ON by default.  make sure it is.  check that nothing untoward will result
+in a production system, e.g. presumably assert()s should not really abort().
+(!! Actually, this should be runtime settable!  Use a variable for this, and
+it can be set using the same XEMACSDEBUG method.  In fact, now that I think
+of it, I'm sure that debugging info should be on always, with runtime ways
+of turning on or off any funny behavior.)
+
+oct 19, 2001:
+
+fixed various bugs preventing packages from being able to be built.  still
+another bug, with psgml/etc/cdtd/docbook, which contains some strange
+characters starting around char pos 110,000.  It gets detected as
+mswindows-multibyte (wrong! why?) and then invalid internal-format data is
+generated.  need to fix mswindows-multibyte (and possibly add something
+that signals an error as well; need to work on this error-signalling
+mechanism) and figure out why it's getting detected as such.  what i should
+do is add a debug var that outputs blow-by-blow info of the detection
+process.
+
+oct 9, 2001:
+
+the stuff with global-window-system-map doesn't appear to work.  in any
+case it needs better documentation. [DONE]
+
+M-home, M-end do work, but cause cl-macs to get loaded.  why?
+
+oct 8, 2001:
+
+finished the coding system changes and they finally work!
+
+need to implement undecided-unix/dos/mac.  they should be easy to do; it
+should be enough to specify an eol-type but not do-eol, but check this.
+
+consider making the standard naming be foo-lf/crlf/cr, with unix/dos/mac as
+aliases.
+
+print methods for coding systems should include some of the generic
+properties. (also then fix print_..._within_print_method). [DONE]
+
+in a little while, go back and delete the text-file-wrapper-coding-system
+code. (it'll be in CVS if necessary to get at it.) [DONE]
+
+need to verify at some point that non-text-file coding systems work
+properly when specified.  when gzip is working, this would be a good test
+case. (and consider creating base64 as well!)
+
+remove extra crap from coding-system-category that checks for chain coding
+systems. [DONE]
+
+perhaps make a primitive that gets at coding-system-canonical. [DONE]
+
+need to test cygwin, compiling the mule packages, get unix-eol stuff
+working.  frank from germany says he doesn't see a lisp backtrace when he
+gets an error during temacs?  verify that this actually gets outputted.
+
+consider putting the current language on the modeline, mousable so it can
+be switched.  also consider making the coding system be mousable and the
+line number (pick a line) and the percentage (pick a percentage).
+
+oct 6, 2001:
+
+added code so that debug_print() will output a newline to the mswindows
+debugging output, not just the console.  need to test. [DONE]
+
+working on problem where all files are being detected as binary.  the
+problem may be that the undecided coding system is getting wrapped with an
+auto-eol coding system, which it shouldn't be -- but even in this
+situation, we should get the right results!  check the
+canonicalize-after-coding methods.  also, determine_real_coding_system
+appears to be getting called even when we're not detecting encoding.  also,
+undecided needs a print method to show its params, and chain needs to be
+updated to show canonicalize_after_coding.  check others as well. [DONE]
+
+oct 5, 2001:
+
+finished up coding system changes, testing.
+
+errors byte-compiling files in iso-2022-7-bit.  perhaps it's not correctly
+detecting the encoding?
+
+noticed a problem in the dfc macros: we call
+get_coding_system_for_text_file with eol_wrap == 1, to allow for
+auto-detection of the eol type; but this defeats the check and
+short-circuit for unicode.
+
+still need to implement calling determine_real_coding_system() for
+non-seekable streams.  to implement correctly, we need to do our own
+buffering. [DONE, BUT WITHOUT BUFFERING]
+
+oct 4, 2001:
+
+implemented most stuff below.
+
+need to finish up changes to make_coding_system_1. (i changed the way
+internal coding systems were handled; i need to create subsidiaries for all
+types of coding systems, not just text ones.) there's a nasty xfree() crash
+i was hitting; perhaps it'll go away once all stuff has been rewritten.
+
+check under cygwin to make sure that when an error occurs during loadup, a
+backtrace is output.
+
+as soon as andy releases his new setup, we should put it onto various
+standard windows software repositories.
+
+oct 3, 2001:
+
+added global-tty-map and global-window-system-map.  add some stuff to the
+maps, e.g. C-x ESC for repeat vs. C-x ESC ESC on TTY's, and of course ESC
+ESC on window systems vs. ESC ESC ESC on TTY's. [TEST]
+
+was working on integrating the two help-for-tutorial versions (mule,
+non-mule). [DONE, but test under non-Mule]
+
+was working on the file-coding changes.  need to think more about
+text-file-wrapper.  conclusion i think is that
+get_coding_system_for_text_file should wrap using a special coding system
+type called a text-file-wrapper, which inherits from chain, and implements
+canonicalize-after-decoding to just return the unwrapped coding system.  We
+need to implement inheritance of coding systems, which will certainly come
+in extremely useful when coding systems get implemented in Lisp, which
+should happen at some point. (see existing docs about this.)  essentially,
+we have a way of declaring that we inherit from some system, and the
+appropriate data structures get created, perhaps just an extra inheritance
+pointer.  but when we create the coding system, the extra data needs to be
+a stretchy array of offsets, pointing to the type-specific data for the
+coding system type and all its parents.  that means that in the methods
+structure for a coding system (which perhaps should be expanded beyond
+method, it's just a "class structure") is the index in these arrays of
+offsets.  CODING_SYSTEM_DATA() can take any of the coding system classes
+(rename type to class!) that make up this class.  similarly, a coding
+system class inherits its methods from the class above unless specifying
+its own method, and can call the superclass method at any point by either
+just invoking its name, or conceivably by some macro like
+
+CALL_SUPER (method, (args))
+
+similar mods would have to be made to coding stream structures.
+
+perhaps for the immediate we can just sort of fake things like we currently
+do with undecided calling some stuff from chain.
+
+oct 2, 2001:
+
+need to implement support for iso-8859-15, i.e. iso-8859-1 + euro symbol.
+figure out how to fall back to iso-8859-1 as necessary.
+
+leave the current bindings the way they are for the moment, but bump off
+M-home and M-end (hardly used), and substitute my buffer movement stuff
+there. [DONE, but test]
+
+there's something to be said for combining block of 6 and paragraph,
+esp. if we make the definition of "paragraph" be so that it skips by 6 when
+within code.  hmm.
+
+eliminate advertised-undo crap, and similar hacks. [DONE]
+
+think about obsolete stuff to be eliminated.  think about eliminating or
+dimming obsolete items from hyper-apropos and something similar in
+completion buffers.
+
+sep 30, 2001:
+
+synched up the tutorials with FSF 21.0.105.  was rewriting them to favor
+the cursor keys over the older C-p, etc. keys.
+
+Got thinking about key bindings again.
+
+(1) I think that M-up/down and M-C-up/down should be reversed.  I use
+    scroll-up/down much more often than motion by paragraph.
+
+(2) Should we eliminate move by block (of 6) and subsitute it for
+    paragraph?  This would have the advantage that I could make bindings
+    for buffer change (forward/back buffer, perhaps M-C-up/down.  with
+    shift, M-C-S-up/down only goes within the same type (C files, etc.).
+    alternatively, just bump off beginning-of-defun from C-M-home, since
+    it's on C-M-a already.
+
+need someone to go over the other tutorials (five new ones, from FSF
+21.0.105) and fix them up to correspond to the english one.
+
+shouldn't shift-motion work with C-a and such as well as arrows?
+
+sep 29, 2001:
+
+charcount_to_bytecount can also be made to scream -- as can scan_buffer,
+buffer_mule_signal_inserted_region, others?  we should start profiling
+though before going too far down this line.
+
+Debug code that causes no slowdown should in general remain in the
+executable even in the release version because it may be useful (e.g. for
+people to see the event output).  so DEBUG_XEMACS should be rethought.
+things like use of msvcrtd.dll should be controlled by error_checking on.
+maybe DEBUG_XEMACS controls general debug code (e.g. use of msvcrtd.dll,
+asserts abort, error checking), and the actual debugging code should remain
+always, or be conditonalized on something else
+(e.g. DEBUGGING_FUNS_PRESENT).
+
+doc strings in dumped files are displayed with an extra blank line between
+each line.  presumably this is recent?  i assume either the change to
+detect-coding-region or the double-wrapping mentioned below.
+
+error with coding-system-property on iso-2022-jp-dos.  problem is that that
+coding system is wrapped, so its type shows up as chain, not iso-2022.
+this is a general problem, and i think the way to fix it is to in essence
+do late canonicalization -- similar in spirit to what was done long ago,
+canonicalize_when_code, except that the new coding system (the wrapper) is
+created only once, either when the original cs is created or when first
+needed.  this way, operations on the coding system work like expected, and
+you get the same results as currently when decoding/encoding.  the only
+thing tricky is handling canonicalize-after-coding and the ever-tricky
+double-wrapping problem mentioned below.  i think the proper solution is to
+move the autodetection of eol into the main autodetect type.  it can be
+asked to autodetect eol, coding, or both.  for just coding, it does like it
+currently does.  for just eol, it does similar to what it currently does
+but runs the detection code that convert-eol currently does, and selects
+the appropriate convert-eol system.  when it does both eol and coding, it
+does something on the order of creating two more autodetect coding systems,
+one for eol only and one for coding only, and chains them together.  when
+each has detected the appropriate value, the results are combined.  this
+automatically eliminates the double-wrapping problem, removes the need for
+complicated canonicalize-after-coding stuff in chain, and fixes the problem
+of autodetect not having a seekable stream because hidden inside of a
+chain. (we presume that in the both-eol-and-coding case, the various
+autodetect coding streams can communicate with each other appropriately.)
+
+also, we should solve the problem of internal coding systems floating
+around and clogging up the list simply by having an "internal" property on
+cs's and an internal param to coding-system-list (optional; if not given,
+you don't get the internal ones). [DONE]
+
+we should try to reduce the size of the from-unicode tables (the dominant
+memory hog in the tables).  one obvious thing is to not store a whole
+emchar as the mapped-to value, but a short that encodes the octets. [DONE]
+
+sep 28, 2001:
+
+need to merge up to latest in trunk.
+
+add unicode charsets for all non-translatable unicode chars; probably want
+to extend the concept of charsets to allow for dimension 3 and dimension 4
+charsets.  for the moment we should stick with just dimension 3 charsets;
+otherwise we run past the current maximum of 4 bytes per emchar. (most code
+would work automatically since it uses MAX_EMCHAR_LEN; the trickiness is in
+certain code that has intimate knowledge of the representation.
+e.g. bufpos_to_bytind() has to multiply or divide by 1, 2, 3, or 4,
+and has special ways of handling each number.  with 5 or 6 bytes per char,
+we'd have to change that code in various ways.) 96x96x96 = 884,000 or so,
+so with two 96x96x96 charsets, we could tackle all Unicode values
+representable by UTF-16 and then some -- and only these codepoints will
+ever have assigned chars, as far as we know.  
+
+need an easy way of showing the current language environment.  some menus
+need to have the current one checked or whatever. [DONE]
+
+implement unicode surrogates.
+
+implement buffer-file-coding-system-when-loaded -- make sure find-file,
+revert-file, etc. set the coding system [DONE]
+
+verify all the menu stuff [DONE]
+
+implemented the entirely-ascii check in buffers.  not sure how much gain
+it'll get us as we already have a known range inside of which is constant
+time, and with pure-ascii files the known range spans the whole buffer.
+improved the comment about how bufpos-to-bytind and vice-versa work. [DONE]
+
+fix double-wrapping of convert-eol: when undecided converts itself to
+something with a non-autodetect eol, it needs to tell the adjacent
+convert-eol to reduce itself to nothing.
+
+need menu item for find file with specified encoding. [DONE]
+
+renamed coding systems mswindows-### to windows-### to follow the standard
+in rfc1345. [DONE]
+
+implemented coding-system-subsidiary-parent [DONE]
+HAVE_MULE -> MULE in files in nt/ so that depend checking works [DONE]
+
+need to take the smarter search-all-files-in-dir stuff from my sample init
+file and put it on the grep menu [DONE]
+
+added item for revert w/specified encoding; mostly works, but needs fixes.
+in particular, you get the correct results, but buffer-file-coding-system
+does not reflect things right.  also, there are too many entries.  need to
+split into submenus.  there is already split code out there; see if it's
+generalized and if not make it so.  it should only split when there's more
+than a specified number, and when splitting, split into groups of a
+specified size, not into a specified number of groups. [DONE]
+
+too many entries in the langenv menus; need to split. [DONE]
+
+sep 27, 2001:
+
+NOTE: M-x grep for make-string causes crash now.  something definitely to
+do with string changes.  check very carefully the diffs and put in those
+sledgehammer checks. [DONE]
+
+fix font-lock bug i introduced. [DONE]
+
+added optimization to strings (keeps track of # of bytes of ascii at the
+beginning of a string).  perhaps should also keep an all-ascii flag to deal
+with really large (> 2 MB) strings.  rewrite code to count ascii-begin to
+use the 4-or-8-at-a-time stuff in bytecount_to_charcount.
+
+Error: M-q is causing Invalid Regexp error on the above paragraph.  It's
+not in working.  I assume it's a side effect of the string stuff.  VERIFY!
+Write sledgehammer checks for strings. [DONE]
+
+revamped the locale/init stuff so that it tries much harder to get things
+right.  should test a bit more.  in particular, test out Describe Language
+on the various created environments and make sure everything looks right.
+
+should change the menus: move the submenus on Edit->Mule directly under
+Edit.  add a menu entry on File to say "Reload with specified encoding ->".
+[DONE]
+
+Also Find File with specified encoding -> Also entry to change the EOL
+settings for Unix, and implement it.
+
+decode-coding-region isn't working because it needs to insert a binary
+(char->byte) converter. [DONE]
+
+chain should be rearranged to be in decoding order; similar for
+source/sink-type, other things?
+
+the detector should check for a magic cookie even without a seekable input.
+(currently its input is not seekable, because it's hidden within a chain.
+#### See what we can do about this.)
+
+provide a way to display various settings, e.g. the current category
+mappings and priority (see mule-diag; get this working so it's in the
+path); also a way to print out the likeliness results from a detection,
+perhaps a debug flag.
+
+problem with `env', which causes path issues due to `env' in packages.
+move env code to process, sync with fsf 21.0.105, check that the autoloads
+in `env' don't cause problems. [DONE]
+
+8-bit iso2022 detection appears broken; or at least, mule-canna.c is not so
+detected.
+
+sep 25, 2001:
+
+something else to do is review the font selection and fix it so that (e.g.) 
+JISX-0212 can be displayed.
+
+also, text in widgets needs to be drawn by us so that the correct fonts
+will be displayed even in multi-lingual text.
+
+sep 24, 2001:
+
+the detection system is now properly abstracted.  the detectors have been
+rewritten to include multiple levels of abstraction.  now we just need
+detectors for ascii, binary, and latin-x, as well as more sophisticated
+detectors in general and further review of the general algorithm for doing
+detection. (#### Is this written up anywhere?) after that, consider adding
+error-checking to decoding (VERY IMPORTANT) and verifying the binary
+correctness of things under unix no-mule.
+
+sep 23, 2001:
+
+began to fix the detection system -- adding multiple levels of likelihood
+and properly abstracting the detectors.  the system is in place except for
+the abstraction of the detector-specific data out of the struct
+detection_state.  we should get things working first before tackling that
+(which should not be too hard).  i'm rewriting algorithms here rather than
+just converting code, so it's harder.  mostly done with everything, but i
+need to review all detectors except iso2022 and make them properly follow
+the new way.  also write a no-conversion detector.  also need to look into
+the `recode' package and see how (if?) they handle detection, and maybe
+copy some of the algorithms.  also look at recent FSF 21.0 and see if their
+algorithms have improved.
+
+sep 22, 2001:
+
+fixed gc bugs from yesterday.
+fixed truename bug.
+close/finalize stuff works.
+eliminated notyet stuff in syswindows.h.
+eliminated special code in tstr_to_c_string.
+fixed pdump problems. (many of them, mostly latent bugs, ugh)
+fixed cygwin sscanf problems in parse-unicode-translation-table. (NOT a
+sscanf bug, but subtly different behavior w.r.t. whitespace in the format
+string, combined with a debugger that sucks ROCKS!! and consistently
+outputs garbage for variable values.)
+main stuff to test is the handling of EOF recognition vs. binary
+(i.e. check what the default settings are under Unix).  then we may have
+something that WORKS on all platforms!!!  (Also need to test Windows
+non-Mule)
+
+sep 21, 2001:
+
+finished redoing the close/finalize stuff in the lstream code.  but i
+encountered again the nasty bug mentioned on sep 15 that disappeared on its
+own then.  the problem seems to be that the finalize method of some of the
+lstreams is calling Lstream_delete(), which calls free_managed_lcrecord(),
+which is a no-no when we're inside of garbage-collection and the object
+passed to free_managed_lcrecord() is unmarked, and about to be released by
+the gc mechanism -- the free lists will end up with xfree()d objects on
+them, which is very bad.  we need to modify free_managed_lcrecord() to
+check if we're in gc and the object is unmarked, and ignore it rather than
+move it to the free list. [DONE]
+
+(#### What we really need to do is do what Java and C# do w.r.t. their
+finalize methods: For objects with finalizers, when they're about to be
+freed, leave them marked, run the finalizer, and set another bit on them
+indicating that the finalizer has run.  Next GC cycle, the objects will
+again come up for freeing, and this time the sweeper notices that the
+finalize method has already been called, and frees them for good (provided
+that a finalize method didn't do something to make the object alive
+again).)
+
+sep 20, 2001:
+
+redid the lstream code so there is only one coding stream.  combined the
+various doubled coding stream methods into one; i'm a little bit unsure of
+this last part, though, as the results of combining the two together seem
+unclean.  got it to compile, but it crashes in loadup.  need to go through
+and rehash the close vs. finalize stuff, as the problem was stuff getting
+freed too quickly, before the canonicalize-after-decoding was run.  should
+eliminate entirely CODING_STATE_END and use a different method (close
+coding stream).  rewrite to use these two.  make sure they're called in the
+right places.  Lstream_close on a stream should *NOT* do finalizing.
+finalize only on delete. [DONE]
+
+in general i'd like to see the flags eliminated and converted to
+bit-fields.  also, rewriting the methods to take advantage of rejecting
+should make it possible to eliminate much of the state in the various
+methods, esp. including the flags.  need to test this is working, though --
+reduce the buffer size down very low and try files with only CRLF's in
+them, with one offset by a byte from the other, and see if we correctly
+handle rejection.
+
+still have the problem with incorrectly truenaming files.
+
+
+sep 19, 2001:
+
+bug reported: crash while closing lstreams.
+
+the lstream/coding system close code needs revamping.  we need to document
+that order of closing lstreams is very important, and make sure we're
+consistent.  furthermore, chain and undecided lstreams need to close their
+underneath lstreams when they receive the EOF signal (there may be data in
+the underneath streams waiting to come out), not when they themselves are
+closed. [DONE]
+
+(if only we had proper inheritance.  i think in any case we should
+simulate it for the chain coding stream -- write things in such a way that
+undecided can use the chain coding stream and not have to duplicate
+anything itself.)
+
+in general we need to carefully think through the closing process to make
+sure everything always works correctly and in the right order.  also check
+very carefully to make sure there are no dangling pointers to deleted
+objects floating around.
+
+move the docs for the lstream functions to the functions themselves, not
+the header files.  document more carefully what exactly Lstream_delete()
+means and how it's used, what the connections are between Lstream_close(),
+Lstream_delete(), Lstream_flush(), lstream_finalize, etc. [DONE]
+
+additional error-checking: consider deadbeefing the memory in objects
+stored in lcrecord free lists; furthermore, consider whether lifo or fifo
+is correct; under error-checking, we should perhaps be doing fifo, and
+setting a minimum number of objects on the lists that's quite large so that
+it's highly likely that any erroneous accesses to freed objects will go
+into such deadbeefed memory and cause crashes.  also, at the earliest
+available opportunity, go through all freed memory and check for any
+consistency failures (overwrites of the deadbeef), crashing if so.  perhaps
+we could have some sort of id for each block, to easier trace where the
+offending block came from. (all of these ideas are present in the debug
+system malloc from VC++, plus more stuff.) there's similar code i wrote
+sitting somewhere (in free-hook.c? doesn't appear so. we need to delete the
+blocking stuff out of there!).  also look into using the debug system
+malloc from VC++, which has lots of cool stuff in it. we even have the
+sources.  that means compiling under pdump, which would be a good idea
+anyway.  set it as the default. (but then, we need to remove the
+requirement that Xpm be a DLL, which is extremely annoying.  look into
+this.)
+
+test the windows code page coding systems recently created.
+
+problems reading my mail files -- 1personal appears to hang, others come up
+with lots of ^M's.  investigate.
+
+test the enum functions i just wrote, and finish them.
+
+still pdump problems.
+
+sep 18, 2001:
+
+critical-quit broken sometime after aug 25.
+
+-- fixed critical quit.
+-- fixed process problems.
+-- print routines work. (no routine for ccl, though)
+-- can read and write unicode files, and they can still be read by some
+   other program
+-- defaults should come up correctly -- mswindows-multibyte is general.
+
+still need to test matej's stuff.
+seems ok with multibyte stuff but needs more testing.
+
+sep 17, 2001:
+
+!!!!! something broken with processes !!!!! cannot send mail anymore.  must
+investigate.
+
+sep 17, 2001:
+
+on mon/wed nights, stop *BEFORE* 11pm.  Otherwise i just start getting
+woozy and can't concentrate.
+
+just finished getting assorted fixups to the main branch committed, so it
+will compile under C++ (Andy committed some code that broke C++ builds).
+cup'd the code into the fixtypes workspace, updated the tags appropriately.
+i've created the appropriate log message, sitting in fixtypes.txt in
+/src/xemacs; perhaps it should go into a README.  now i just have to build
+on everything (it's currently building), verify it's ok, run patcher-mail,
+commit, send.
+
+my mule ws is also very close.  need to:
+
+-- test the new print routines.
+-- test it can read and write unicode files, and they can still be read by
+   some other program.
+-- try to see if unicode can be auto-detected properly.
+-- test it can read and write multibyte files in a few different formats.
+   currently can't recognize them, but if you set the cs right, it should
+   work.
+-- examine the test files sent by matej and see if we can handle them.
+
+sep 15, 2001:
+
+more eol fixing.  this stuff is utter crap.
+
+currently we wrap coding systems with convert-eol-autodetect when we create
+them in make_coding_system_1.  i had a feeling that this would be a
+problem, and indeed it is -- when autodetecting with `undecided', for
+example, we end up with multiple layers of eol conversion.  to avoid this,
+we need to do the eol wrapping *ONLY* when we actually retrieve a coding
+system in places such as insert-file-contents.  these places are
+insert-file-contents, load, process input, call-process-internal,
+encode/decode/detect-coding-region, database input, ...
+
+(later) it's fixed, and things basically work.  NOTE: for some reason,
+adding code to wrap coding systems with convert-eol-lf when eol-type == lf
+results in crashing during garbage collection in some pretty obscure place
+-- an lstream is free when it shouldn't be.  this is a bad sign.  i guess
+something might be getting initialized too early?
+
+we still need to fix the canonicalization-after-decoding code to avoid
+problems with coding systems like `internal-7' showing up.  basically, when
+eol==lf is detected, nil should be returned, and the callers should handle
+it appropriately, eliding when necessary.  chain needs to recognize when
+it's got only one (or even 0) items in the chain, and elide out the chain.
+
+sep 11, 2001: the day that will live in infamy.
+
+rewrite of sep 9 entry about formats:
+
+when calling make-coding-system, the name can be a cons of (format1 .
+format2), specifying that it decodes format1->format2 and encodes the other
+way.  if only one name is given, that is assumed to be format1, and the
+other is either `external' or `internal' depending on the end type.
+normally the user when decoding gives the decoding order in formats, but
+can leave off the last one, `internal', which is assumed.  a multichain
+might look like gzip|multibyte|unicode, using the coding systems named
+`gzip', `(unicode . multibyte)' and `unicode'.  the way this actually works
+is by searching for gzip->multibyte; if not found, look for gzip->external
+or gzip->internal. (In general we automatically do conversion between
+internal and external as necessary: thus gzip|crlf does the expected, and
+maps to gzip->external, external->internal, crlf->internal, which when
+fully specified would be gzip|external:external|internal:crlf|internal --
+see below.)  To forcibly fit together two converters that have explicitly
+specified and incompatible names (say you have unicode->multibyte and
+iso8859-1->ebcdic and you know that the multibyte and iso8859-1 in this
+case are compatible), you can force-cast using :, like this:
+ebcdic|iso8859-1:multibyte|unicode. (again, if you force-cast between
+internal and external formats, the conversion happens automatically.)
+
+
+sep 10, 2001:
+
+moved the autodetection stuff (both codesys and eol) into particular coding
+systems -- `undecided' and `convert-eol' (type == `autodetect').  needs
+lots of work.  still need to search through the rest of the code and find
+any remaining auto-detect code and move it into the undecided coding
+system.  need to modify make-coding-system so that it spits out
+auto-detecting versions of all text-file coding systems unless we say not
+to.  need eliminate entirely the EOF flag from both the stream info and the
+coding system; have only the original-eof flag.  in
+coding_system_from_mask, need to check that the returned value is not of
+type `undecided', falling back to no-conversion if so.  also need to make
+sure we wrap everything appropriate for text-files -- i removed the
+wrapping on set-coding-category-list or whatever (need to check all those
+files to make sure all wrapping is removed).  need to review carefully the
+new code in `undecided' to make sure it works are preserves the same logic
+as previously.  need to review the closing and rewinding behavior of chain
+and undecided (same -- should really consolidate into helper routines, so
+that any coding system can embed a chain in it) -- make sure the dynarr's
+are getting their data flushed out as necessary, rewound/closed in the
+right order, no missing steps, etc.
+
+also split out mule stuff into mule-coding.c.  work done on
+configure/xemacs.mak/Makefiles not done yet.  work on emacs.c/symsinit.h to
+interface with the new init functions not done yet.
+
+also put in a few declarations of the way i think the abstracted detection
+stuff ought to go.  DON'T WORK ON THIS MORE UNTIL THE REST IS DEALT WITH
+AND WE HAVE A WORKING XEMACS AGAIN WITH ALL EOL ISSUES NAILED.
+
+really need a version of cvs-mods that reports only the current directory.
+WRITE THIS!  use it to implement a better cvs-checkin.
+
+sep 9, 2001:
+
+implemented a gzip coding system.  unfortunately, doesn't quite work right
+because it doesn't handle the gzip headers -- it just reads and writes raw
+zlib data.  there's no function in the library to skip past the header, but
+we do have some code out of the library that we can snarf that implements
+header parsing.  we need to snarf that, store it, and output it again at
+the beginning when encoding.  in the process, we should create a "get next
+byte" macro that bails out when there are no more.  using this, we set up a
+nice way of doing most stuff statelessly -- if we have to bail, we reject
+everything back to the sync point.  also need to fix up the autodetection
+of zlib in configure.in.
+
+BIG problems with eol.  finished up everything i thought i would need to
+get eol stuff working, but no -- when you have mswindows-unicode, with its
+eol set to autodetect, the detection routines themselves do the autodetect
+(first), and fail (they report CR on CRLF because of the NULL byte between
+the CR and the LF) since they're not looking at ascii data.  with a chain
+it's similarly bad. for mswindows-multibyte, for example, which is a chain
+unicode->unicode-to-multibyte, autodetection happens inside of the chain,
+both when unicode and unicode-to-multibyte are active.  we could twiddle
+around with the eol flags to try to deal with this, but it's gonna be a big
+mess, which is exactly what we're trying to avoid.  what we basically want
+is to entirely rip out all EOL settings from either the coding system or
+the stream (yes, there are two!  one might saw autodetect, and then the
+stream contains the actual detected value).  instead, we simply create an
+eol-autodetect coding system -- or rather, it's part of the convert-eol
+coding system.  convert-eol, type = autodetect, does autodetection the
+first time it gets data sent to it to decode, and thereafter sets a stream
+parameter indicating the actual eol type for this stream.  this means that
+all autodetect coding systems, as created by `make-coding-system', really
+are chains with a convert-eol at the beginning.  only subsidiary xxx-unix
+has no wrapping at all.  this should allow eof detection of gzip, unicode,
+etc.  for that matter, general autodetection should be entirely
+encapsulated inside of the `autodetect' coding system, with no
+eol-autodetection -- the chain becomes convert-eol (autodetect) ->
+autodetect or perhaps backwards.  the generic autodetect similarly has a
+coding-system in its stream methods, and needs somehow or other to insert
+the detected coding-system into the chain.  either it contains a chain
+inside of it (perhaps it *IS* a chain), or there's some magic involving
+canonicalization-type switcherooing in the middle of a decode.  either way,
+once everything is good and done and we want to save the coding system so
+it can be used later, we need to do another sort of canonicalization --
+converting auto-detect-type coding systems into the detected systems.
+again, a coding-system method, with some magic currently so that
+subsidiaries get properly used rather than something that's new but
+equivalent to subsidiaries. (#### perhaps we could use a hash table to
+avoid recreating coding systems when not necessary.  but that would require
+that coding systems be immutable from external, and i'm not sure that's the
+case.)
+
+i really think, after all, that i should reverse the naming of everything
+in chain and source-sink-type -- they should be decoding-centric.  later
+on, if/when we come up with the proper way to make it totally symmetrical,
+we'll be fine whether before then we were encoding or decoding centric.
+
+
+sep 9, 2001:
+
+investigated eol parameter.
+implemented handling in make-coding-system of eol-cr and eol-crlf.
+fixed calls everywhere to Fget_coding_system / Ffind_coding_system to
+reject non-char->byte coding systems.
+
+still need to handle "query eol type using coding-system-property" so it
+magically returns the right type by parsing the chain.
+
+no work done on formats, as mentioned below.  we should consider using :
+instead of || to indicate casting.
+
+early sep 9, 2001:
+
+renamed some codesys properties: `list' in chain -> chain; `subtype' in
+unicode -> type.  everything compiles again and sort of works; some CRLF
+problems that may resolve themselves when i finish the convert-eol stuff.
+the stuff to create subsidiaries has been rewritten to use chains; but i
+still need to investigate how the EOL type parameter is used.  also, still
+need to implement this: when a coding system is created, and its eol type
+is not autodetect or lf, a chain needs to be created and returned.  i think
+that what needs to happen is that the eol type can only be set to
+autodetect or lf; later on this should be changed to simply be either
+autodetect or not (but that would require ripping out the eol converting
+stuff in the various coding systems), and eventually we will do the work on
+the detection mechanism so it can do chain detection; then we won't need an
+eol autodetect setting at all.  i think there's a way to query the eol type
+of a coding system; this should check to see if the coding system is a
+chain and there's a convert-eol at the front; if so, the eol type comes
+from the type of the convert-eol.
+
+also check out everywhere that Fget_coding_system or Ffind_coding_system is
+called, and see whether anything but a char->byte system can be tolerated.
+create a new function for all the places that only want char->byte,
+something like get_coding_system_char_to_byte_only.
+
+think about specifying formats in make-coding-system.  perhaps the name can
+be a cons of (format1, format2), specifying that it encodes
+format1->format2 and decodes the other way.  if only one name is given,
+that is assumed to be format2, and the other is either `byte' or `char'
+depending on the end type.  normally the user when decoding gives the
+decoding order in formats, but can leave off the last one, `char', which is
+assumed.  perhaps we should say `internal' instead of `char' and `external'
+instead of byte.  a multichain might look like gzip|multibyte|unicode,
+using the coding systems named `gzip', `(unicode . multibyte)' and
+`unicode'.  we would have to allow something where one format is given only
+as generic byte/char or internal/external to fit with any of the same
+byte/char type.  when forcibly fitting together two converters that have
+explicitly specified and incompatible names (say you have
+unicode->multibyte and iso8859-1->ebcdic and you know that the multibyte
+and iso8859-1 in this case are compatible), you can force-cast using ||,
+like this: ebcdic|iso8859-1||multibyte|unicode.  this will also force
+external->internal translation as necessary:
+unicode|multibyte||crlf|internal does unicode->multibyte,
+external->internal, crlf->internal.  perhaps you'd need to put in the
+internal translation, like this: unicode|multibyte|internal||crlf|internal,
+which means unicode->multibyte, external->internal (multibyte is compatible
+with external); force-cast to crlf format and convert crlf->internal.
+
+even later: Sep 8, 2001:
+
+chain doesn't need to set character mode, that happens automatically when
+the coding systems are created.  fixed chain to return correct source/sink
+type for itself and to check the compatibility of source/sink types in its
+chain.  fixed decode/encode-coding-region to check the source and sink
+types of the coding system performing the conversion and insert appropriate
+byte->char/char->byte converters (aka "binary" coding system).  fixed
+set-coding-category-system to only accept the traditional
+encode-char-to-byte types of coding systems.
+
+still need to extend chain to specify the parameters mentioned below,
+esp. "reverse".  also need to extend the print mechanism for chain so it
+prints out the chain.  probably this should be general: have a new method
+to return all properties, and output those properties.  you could also
+implement a read syntax for coding systems this way.
+
+still need to implement convert-eol and finish up the rest of the eol stuff
+mentioned below.
+
+later September 7, 2001: (more like Sep 8)
+
+moved many Lisp_Coding_System * params to Lisp_Object.  In general this is
+the way to go, and if we ever implement a copying GC, we will never want to
+be passing direct pointers around.  With no error-checking, we lose no
+cycles using Lisp_Objects in place of pointers -- the Lisp_Object itself is
+nothing but a pointer, and so all the casts and "dereferences" boil down to
+nothing.
+
+Clarified and cleaned up the "character mode" on streams, and documented
+who (caller or object itself) has the right to be setting character mode on
+a stream, depending on whether it's a read or write stream.  changed
+conversion_end_type method and enum source_sink_type to return
+encoding-centric values, rather than decoding-centric.  for the moment,
+we're going to be entirely encoding-centric in everything; we can rethink
+later.  fixed coding systems so that the decode and encode methods are
+guaranteed to receive only full characters, if that's the source type of
+the data, as per conversion_end_type.
+
+still need to fix the chain method so that it correctly sets the character
+mode on all the lstreams in it and checks the source/sink types to be
+compatible.  also fix decode-coding-string and friends to put the
+appropriate byte->character (i.e. no-conversion) coding systems on the ends
+as necessary so that the final ends are both character.  also add to chain
+a parameter giving the ability to switch the direction of conversion of any
+particular item in the chain (i.e. swap encoding and decoding).  i think
+what we really want to do is allow for arbitrary parameters to be put onto
+a particular coding system in the chain, of which the only one so far is
+swap-encode-decode.  don't need too much codage here for that, but make the
+design extendable.
+
+
+
+September 7, 2001:
+
+just added a return value from the decode and encode methods of a coding
+system, so that some of the data can get rejected.  fixed the calling
+routines to handle this.  need to investigate when and whether the coding
+lstream is set to character mode, so that the decode/encode methods only
+get whole characters.  if not, we should do so, according to the source
+type of these methods.  also need to implement the convert_eol coding
+system, and fix the subsidiary coding systems (and in general, any coding
+system where the eol type is specified and is not LF) to be chains
+involving convert_eol.
+
+after everything is working, need to remove eol handling from encode/decode
+methods and eventually consider rewriting (simplifying) them given the
+reject ability.
+
+September 5, 2001:
+
+-- need to organize this.  get everything below into the TODO list.
+   CVS the TODO list frequently so i can delete old stuff.  prioritize
+   it!!!!!!!!!
+
+-- move README.ben-mule... to STATUS.ben-mule...; use README for
+   intro, overview of what's new, what's broken, how to use the
+   features, etc.
+
+-- need a global and local coding-category-precedence list, which get
+   merged.
+
+-- finished the BOM support.  also finished something not listed
+   below, expansion to the auto-generator of Unicode-encapsulation to
+   support bracketing code with #if ... #endif, for Cygwin and MINGW
+   problems, e.g.  This is tested; appears to work.
+
+-- need to add more multibyte coding systems now that we have various
+   properties to specify them.  need to add DEFUN's for mac-code-page
+   and ebcdic-code-page for completeness.  need to rethink the whole
+   way that the priority list works.  it will continue to be total
+   junk until multiple levels of likeliness get implemented.
+
+-- need to finish up the stuff about the various defaults. [need to
+   investigate more generally where all the different default values
+   are that control encoding. (there are six places or so.) need to
+   list them in make-coding-system docs and put pointers
+   elsewhere. [[[[#### what interface to specify that this default
+   should be unicode?  a "Unicode" language environment seems too
+   drastic, as the language environment controls much more.]]]] even
+   skipping the Unicode stuff here, we need to survey and list the
+   variables that control coding page behavior and determine how they
+   need to be set for various possible scenarios:
+
+  -- total binary: no detection at all.
+  -- raw-text only: wants only autodetection of line endings, nothing else.
+  -- "standard Windows environment": tries for Unicode, falls back on
+     code page encoding.
+  -- some sort of East European environment, and Russian.
+  -- some sort of standard Japanese Windows environment.
+  -- standard Chinese Windows environments (traditional and simplified)
+  -- various Unix environments (European, Japanese, Russian, etc.)
+  -- Unicode support in all of these when it's reasonable
+
+These really require multiple likelihood levels to be fully
+implementable.  We should see what can be done ("gracefully fall
+back") with single likelihood level.  need lots of testing.
+
+-- need to fix the truename problem.
+
+-- lots of testing: need to test all of the stuff above and below that's recently been implemented.
+
+
+
+September 4, 2001:
+
+mostly everything compiles.  currently there is a crash in
+parse-unicode-translation-table, and Cygwin/Mule won't run.  it may
+well be a bug in the sscanf() in Cygwin.
+
+working on today:
+
+-- adding BOM support for Unicode coding systems.  mostly there, but
+   need to finish adding BOM support to the detection routines.  then test.
+-- adding properties to unicode-to-multibyte to specify the coding
+   system in various flexible ways, e.g. directly specified code page
+   or ansi or oem code page of specified locale, current locale,
+   user-default or system-default locale.  need to test.
+-- creating a `multibyte' coding system, with the same parameters as
+   unicode-to-multibyte and which resolves at coding-system-creation
+   time to the appropriate chain.  creating the underlying mechanism
+   to allow such under-the-scenes switcheroo.  need to test.
+-- set default-value of buffer-file-coding-system to
+   mswindows-multibyte, as Matej said it should be.  need to test.
+   need to investigate more generally where all the different default
+   values are that control encoding. (there are six places or so.) 
+   need to list them in make-coding-system docs and put pointers
+   elsewhere. #### what interface to specify that this default should
+   be unicode?  a "Unicode" language environment seems too drastic, as
+   the language environment controls much more.
+-- thinking about adding multiple levels of certainty to the detection
+   schemes, instead of just a mask.  eventually, we need to totally
+   abstract things, but that can easier be done in many steps. (we
+   need multiple levels of likelihood to more reasonably support a
+   Windows environment with code-page type files.  currently, in order
+   to get them detected, we have to put them first, because they can
+   look like lots of other things; but then, other encodings don't get
+   detected.  with multiple levels of likelihood, we still put the
+   code-page categories first, but they will return low levels of
+   likelihood.  Lower-down encodings may be able to return higher
+   levels of likelihood, and will get taken preferentially.)
+-- making it so you cannot disable file-coding, but you get an
+   equivalent default on Unix non-Mule systems where all defaults are
+   `binary'.  need to test!!!!!!!!!
+
+Matej (mostly, + some others) notes the following problems, and here
+are possible solutions:
+
+-- he wants the defaults to work right. [figure out what those
+   defaults are.  i presume they are auto-detection of data in current
+   code page and in unicode, and new files have current code page set
+   as their output encoding.]
+
+-- too easy to lose data with incorrect encodings. [need to set up an
+   error system for encoding/decoding.  extremely important but a
+   little tricky to implement so let's deal with other issues now.]
+
+-- EOL isn't always detected correctly. [#### ?? need examples]
+
+-- truename isn't working: c:\t.txt and c:\tmp.txt have the same truename.
+   [should be easy to fix]
+
+-- unicode files lose the BOM mark. [working on this]
+
+-- command-line utilities use OEM. [actually it seems more
+   complicated.  it seems they use the codepage of the console.  we
+   may be able to set that, e.g. to UTF8, before we invoke a command.
+   need to investigate.]
+
+-- no way to handle unicode characters not recognized as charsets. [we
+   need to create something like 8 private 2-dimensional charsets to
+   handle all BMP Unicode chars.  Obviously this is a stopgap
+   solution.  Switching to Unicode internal will ultimately make life
+   far easier and remove the BMP limitation.  but for now it will
+   work.  we translate all characters where we have charsets into
+   chars in those charsets, and the remainder in a unicode charset.
+   that way we can save them out again and guarantee no data loss with
+   unicode.  this creates font problems, though ...]
+
+-- problems with xemacs font handling. [xemacs font handling is not
+   sophisticated enough.  it goes on a charset granularity basis and
+   only looks for a font whose name contains the corresponding windows
+   charset in it.  with unicode this fails in various ways.  for one
+   the granularity needs to be single character, so that those unicode
+   charsets mentioned above work; and it needs to query the font to
+   see what unicode ranges it supports, rather than just looking at
+   the charset ending.]
+
+
+
+August 28, 2001:
+
+working on getting everything to compile again: Cygwin, non-MULE,
+pdump.  not there yet.
+
+mswindows-multibyte is now defined using chain, and works.  removed
+most vestiges of the mswindows-multibyte coding system type.
+
+file-coding is on by default; should default to binary only on Unix.
+Need to test. (Needs to compile first :-)
+
+August 26, 2001:
+
+I've fixed the issue of inputting non-ASCII text under -nuni, and done
+some of the work on the Russian C-x problem -- we now compute the
+other possibilities.  We still need to fix the key-lookup code,
+though, and that code is unfortunately a bit ugly.  the best way, it
+seems, is to expand the command-builder structure so you can specify
+different interpretations for keys. (if we do find an alternative
+binding, though, we need to mess with both the command builder and
+this-command-keys, as does the function-key stuff.  probably need to
+abstract that munging code.)
+
+high-priority:
+
+[currently doing]
+
+-- support for WM_IME_CHAR.  IME input can work under -nuni if we use
+   WM_IME_CHAR.  probably we should always be using this, instead of
+   snarfing input using WM_COMPOSITION.  i'll check this out.
+-- Russian C-x problem.  see above.
+
+[clean-up]
+
+-- make sure it compiles and runs under non-mule.  remember that some
+   code needs the unicode support, or at least a simple version of it.
+-- make sure it compiles and runs under pdump.  see below.
+-- clean up mswindows-multibyte, TSTR_TO_C_STRING.  see below. [DONE]
+-- eliminate last vestiges of codepage<->charset conversion and similar stuff.
+
+[other]
+-- cut and paste.  see below.
+-- misc issues with handling lang environments.  see also August 25,
+   "finally: working on the C-x in ...".
+   -- when switching lang env, needs to set keyboard layout.
+   -- user var to control whether, when moving into text of a
+      particular language, we set the appropriate keyboard layout.  we
+      would need to have a lisp api for retrieving and setting the
+      keyboard layout, set text properties to indicate the layout of
+      text, and have a way of dealing with text with no property on
+      it. (e.g. saved text has no text properties on it.) basically,
+      we need to get a keyboard layout from a charset; getting a
+      language would do.  Perhaps we need a table that maps charsets
+      to language environments.
+   -- test that the lang env is properly set at startup.  test that
+      switching the lang env properly sets the C locale (call
+      setlocale(), set LANG, etc.) -- a spawned subprogram should have
+      the new locale in its environment.
+-- look through everything below and see if anything is missed in this
+   priority list, and if so add it.  create a separate file for the
+   priority list, so it can be updated as appropriate.
+
+
+mid-priority:
+
+-- clean up the chain coding system.  its list should specify decode
+   order, not encode; i now think this way is more logical.  it should
+   check the endpoints to make sure they make sense.  it should also
+   allow for the specification of "reverse-direction coding systems":
+   use the specified coding system, but invert the sense of decode and
+   encode.
+
+-- along with that, places that take an arbitrary coding system and
+   expect the ends to be anything specific need to check this, and add
+   the appropriate conversions from byte->char or char->byte.
+
+-- get some support for arabic, thai, vietnamese, japanese jisx 0212:
+   at least get the unicode information in place and make sure we have
+   things tied together so that we can display them.  worry about r2l
+   some other time.
+
+August 25, 2001:
+
+There is actually more non-Unicode-ized stuff, but it's basically
+inconsequential. (See previous note.) You can check using the file
+nmkun.txt (#### RENAME), which is just a list of all the routines that
+have been split. (It was generated from the output of `nmake
+unicode-encapsulate', after removing everything from the output but
+the function names.) Use something like
+
+fgrep -f ../nmkun.txt -w [a-hj-z]*.[ch]  |m
+
+in the source directory, which does a word match and skips
+intl-unicode-win32.[ch] and intl-win32.[ch], which have a whole lot of
+references to these, unavoidably.  It effectively detects what needs
+to be changed because changed versions either begin qxe... or end with
+A or W, and in each case there's no whole-word match.
+
+The nasty bug has been fixed below.  The -nuni option now works -- all
+specially-written code to handle the encapsulation has been tested by
+some operation (fonts by loadup and checking the output of (list-fonts
+""); devmode by printing; dragdrop tests other stuff).
+
+NOTE: for -nuni (Win 95), areas need work:
+
+-- cut and paste.  we should be able to receive Unicode text if it's
+   there, and we should be able to receive it even in Win 95 or -nuni.
+   we should just check in all circumstances.  also, under 95, when we
+   put some text in the clipboard, it may or may not also be
+   automatically enumerated as unicode.  we need to test this out
+   and/or just go ahead and manually do the unicode enumeration.
+
+-- receiving keyboard input.  we get only a single byte, but we should
+   be able to correlate the language of the keyboard layout to a
+   particular code page, so we can then decode it correctly.
+
+-- mswindows-multibyte.  still implemented as its own thing.  should
+   be done as a chain of (encoding) unicode | unicode-to-multibyte.
+   need to turn this on, get it working, and look into optimizations
+   in the dfc stuff. (#### perhaps there's a general way to do these
+   optimizations???  something like having a method on a coding system
+   that can specify whether a pure-ASCII string gets rendered as
+   pure-ASCII bytes and vice-versa.)
+
+
+ALSO:
+
+-- we have special macros TSTR_TO_C_STRING and such because formerly
+   the DFC macros didn't know about external stuff that was Unicode
+   encoded and would call strlen() on them.  this is fixed, so now we
+   should undo the special macros, make em normal, removal the
+   comments about this, and make sure it works. [DONE]
+
+
+-- finally: working on the C-x in Russian key layout problem.  in the
+   process will probably end up doing work on cleaning up the handling
+   of keyboard layouts, integrating or deleting the FSF stuff, adding
+   code to change the keyboard layout as we move in and out of text in
+   different languages (implemented as a post-command-hook; we need
+   something like internal-post-command-hook if not already there, for
+   internal stuff that doesn't want to get mixed up with the regular
+   post-command-hook; similar for pre-command-hook).  also, when
+   langenv changes, ways to set the keyboard layout appropriately.
+
+-- i think the stuff above is higher priority than the other stuff
+   mentioned below.  what i'm aiming for is to be able to input and
+   work with multiple languages without weird glitches, both under 95
+   and NT.  the problems above are all basic impediments to such work.
+   we assume for the moment that the user can make use of the existing
+   file i/o conversion stuff, and put that lower in priority, after
+   the basic input is working.
+
+-- i should get my modem connected and write up what's going on and
+   send it to the lists; also cvs commit my workspaces and get more
+   testers.
+