Source

mule-base / texi / mule.texi

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544

@node Coding-system
@section Coding-system

@noindent
`coding-system' is a method for encoding several
character-sets and represented by a symbol which has
properties of 'coding-system and 'eol-type.

You can specify different coding-system on file I/O, process
I/O, output to terminal (if not running on X), input from
keyboard (if not running on X).


@menu
* Structure::   Structure of coding-system
	  o Property 'coding-system
	  o Property 'eol-type
	  o Property 'post-read-conversion
	  o Property 'pre-write-conversion
* Creation::   How to create coding-system?
* Predefined coding-system::
* Automatic conversion::
	  o Category of coding-system
	  o How automatic conversion works?
	  o Priority of category
* Mode-line::   How coding-system is shown in mode-line?::
* ISO2022 restriction::
* Big5::        Special treatment of Big5
@end menu

@node Structure
@subsection Structure of coding-system

@subsubsection Property 'coding-system

The value of the property 'coding-system is a vector:
@quotation
  [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ]
@end quotation
or the other coding-system.  Contents of the vector are:
@example
  TYPE:	nil: no conversion, t: automatic conversion,
	0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL.
  MNEMONIC: a character shown at mode-line to indicate the coding-system.
  DOCUMENT: a describing documents for the coding-system.
  DUMMY: always nil (for backward compatibility)
  FLAGS (option): more precise information about the coding-system,
    If TYPE is 2 (ISO2022), FLAGS should be a list of:
      LB-G0, LB-G1, LB-G2, LB-G3:
	Leading character of charset initially designated to G? graphic set,
	nil means G? is not designated initially,
	lb-invalid means G? can never be designated to,
	if (- leading-char) is specified, it is designated on output,
      SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\",
      ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output,
      ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output
      SEVEN: non-nil - use 7-bit environment on output,
      LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift
	or designation by escape sequence,
      USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII,
      USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983,
      NO-ISO6429: non-nil - don't use ISO6429's direction specification,
  If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU,
  If TYPE is 4 (private), FLAGS should be a cons of CCL programs
    for encoding and decoding.  See documentation of CCL for more detail.
@end example

@subsubsection Property 'eol-type

The value of the property 'eol-type is:
  nil: no conversion for end-of-line type
  1:   LF
  2:   CRLF
  3:   CR
  vector of length 3: automatic detection of end-of-line type.
	1st element: coding-system of eol-type LF
	2nd element: coding-system of eol-type CRLF
	3rd element: coding-system of eol-type CR

@subsubsection Property 'post-read-conversion

The value of the property 'post-read-conversion is a
function to convert some text just read into a buffer.  When
the function is called, the text has already been converted
according to 'coding-system and 'eol-type of the
coding-system.  The argument of the function is the region
(START and END) of inserted text.

@subsection Property 'pre-write-conversion

The value of the property 'pre-write-conversion is a
function to convert some text just before writing it out.
After the function is called, the text is converted accoding
to 'coding-system and 'eol-type of the coding-system.  The
argument of the function is the region (START and END) of
the text.

@node Creation
@subsection How to create coding-system?

Mule provides a function `make-coding-system' to create a
coding-system.

FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE FLAGS

Register symbol NAME as a coding-system whose 'coding-system
property is a vector [ TYPE MNEMONIC DOC nil FLAGS ] and
'eol-type property is EOL-TYPE.  If `t' is specified as
EOL-TYPE, the value of 'eol-type property is a vector of
generated coding-systems whose 'eol-type properties are 1
(LF), 2 (CRLF), and 3 (CR).  The names of generated
coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively.

Just to make an alias of some coding-system, call a function
`copy-coding-system'.

FUNCTION copy-coding-system: ORIGINAL ALIAS

Make the same coding-system as ORIGINAL and name it ALIAS.
If 'eol-type property of ORIGINAL is a vector, coding-systems
ALIASunix, ALIASdos, and ALIASmac are generated, and
'eol-type property of ALIAS becomes a vector of them.

@node Predefined coding-system
@subsection Predefined coding-system

See lisp/mule.el.

@node Automatic conversion
@subsection Automatic conversion

@subsubsection Category of coding-system

Mule has a facility to detect coding-system of text
automatically, however, what mule actually detect is not a
coding-system itself but a category of coding-system.  A
category is also represented by a symbol and a value should
be an actual coding-system.

There are eight categories:
@table @asis
@item *coding-category-internal*:
	coding-system used in a buffer
@item *coding-category-sjis*
	Shift-JIS
@item *coding-category-iso-7*
	ISO2022 variation with the following feature:
	  o no locking shift, single shift
	  o only G0 is used
@item *coding-category-iso-8-1*
	ISO2022 variation with the following feature:
	  o no locking shift
	  o designation sequence is allowed only for G0 and G1
	  o G1 is used only for 1-byte character set
@item *coding-category-iso-8-2*
	ISO2022 variation with the following feature:
	  o no locking shift
	  o designation sequence is allowed only for G0 and G1
	  o G1 is used only for 2-byte character set
@item *coding-category-iso-else*
	ISO2022 variation which doesn't satisfy any of above.
@item *coding-category-big5*
	Big5 (ETen or HKU)
@item *coding-category-bin*
	Any other coding-system which uses MSB.
@end table

The values of these symbols are pre-defined as follows:

@example
----- lisp/mule.el -----------------------------------------
(defvar *coding-category-internal* '*internal*)
(defvar *coding-category-sjis* '*sjis*)
(defvar *coding-category-iso-7* '*junet*)
(defvar *coding-category-iso-8-1* '*ctext*)
(defvar *coding-category-iso-8-2* '*euc-japan*)
(defvar *coding-category-iso-else* '*iso-2022-ss2-7*)
(defvar *coding-category-big5* '*big5-eten*)
(defvar *coding-category-bin* '*noconv*)
------------------------------------------------------------
@end example

but, some of them are overridden in such language specific
files as japanese.el, chinese.el, etc.

@subsubsection How automatic conversion works?

When coding-system `*autoconv*' is specified on reading text
(this is the default), mule tries to detect a category of
coding-system by which text are encoded.  If an appropriate
category is found, it converts text according to a
coding-system bound to the cateogry.  If the 'eol-type
property of the coding-system is a vector of coding-systems
and Mule detects a type of end-of-line (LF, CRLF, or CR) of
the text, one of those coding-system is used.

Automatic conversion occurs both on reading from files and
inputing from process.  In the latter case, if some
coding-system is found, output-coding-system of the process
is also set to the found coding-system.

@subsubsection Priority of cateogry

In the case that more than two categories are found, the
category of the highest priority is selected.

A priority of category is pre-defined as follows:

@example
----- lisp/mule.el -----------------------------------------
(set-coding-priority
 '(*coding-category-iso-8-2*
   *coding-category-sjis*
   *coding-category-iso-8-1*
   *coding-category-big5*
   *coding-category-iso-7*
   *coding-category-iso-else*
   *coding-category-bin*
   *coding-category-internal*))
------------------------------------------------------------
@end example

The function `set-coding-priority' put a property 'priority
to each element of the argument from 0 to 7 (smaller number
has higher priority).  Some language specific files may
override this priority.

@node Mode-line
@subsection How coding-system is shown in mode-line?

Each coding-system has unique mnemonic (one character).
By default, mnemonic of `file-coding-system' of a buffer is
shown at the left of mode-line of the buffer.  In addition,
the mnemonic is followed by an another mnemonic to show
eol-type of the coding-system.  This mnemonic is defined as
follows:
	".": LF
	":": CRLF
	"'": CR
	"_": not yet desided
	"-": nil (for coding-system of nil, *noconv*, or *internal*)
So, usual appearance of mode-line for a buffer which is
visiting a file (*junet* encoding on Unix system) is:

@example
	    +-- mnemonic of file-coding-system
	    |+-- mnemonic of eol-type
	    VV
	[--]J.:----Mule: filename
@end example

The left most bracket is the indicator for inputing method.

When a buffer is attaced to some process, coding-system
for input and output of the process are also shown as
follows:

@example
	    +-- mnemonic of file-coding-system
	    |+-- mnemonic of eol-type of file-coding-system
	    ||+-- mnemonic of input-coding-system of a process
	    |||+-- mnemonic of eol-type of input-coding-system
	    ||||+-- mnemonic of output-coding-system of a process
	    |||||+-- mnemonic of eol-type of output-coding-system
	    VVVVVV
	[--]+_+.--:--**-Mule: *shell*
@end example

This means that Mule is now communicating with shell with
coding-systems *autoconv*unix ("+.") for input and nil
("--") for output.

@node ISO2022 restriction
@subsection ISO2022 restriction

For decoding to Type 2 (ISO2022), we have the following
restrictions:

@table @asis
@item Locking-Shift:
Use SI and SO only when decoding with a coding-system
whose LOCK-SHIFT and SEVEN is t.

@item Single-Shift:
Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if
SEVEN is t).

@item Invocation:
G0 is always invoked to GL, G1 to GR (but only if SEVEN is
nil).  G2 and G3 are invoked to GL by Single-Shift of SS2
and SS3.

@item Unofficial use of ESC sequence for designation:
If SEVEN is t, LOCK-SHIFT is nil, and designation to G2
and G3 are prohibited, we should designate all character
sets to G0 (and hence invoke to GL).  To designate 96
char-set to G0, we use "ESC , <F>".  For instance, to
designate ISO8859-1 to G0, we use "ESC , A".

@item Unofficial use of ESC sequence for composit character:
To indicate the start and end of composit character, we
use ESC 0 (start) and ESC 1 (end).

@item Text direction specifier of ISO6429
We use ISO6429's ESC sequence "ESC [ 2 ]" to change text
direction to right-to-left, and "ESC [ 0 ]" to revert it
to left-to-right.
@end table

@node Big5
@subsection Special treatment of Big5

As far as I know, there's several different codes called
Big5.  The most famous ones are Big5-ETen and
Big5-HKU-form2.  Since both of them use a code range 0xa140
- 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is
skipped) and number of characters is more than 13000, it's
impossible to treat each of them as a single character-set
in the current Mule system.  So, Mule treat them in a quite
irregular manner as described below:

@enumerate
@item
Mule does not treats them as a different character set,
but as the same character set called Big5.
	Caution!! Big5 is a different character set from GB.

@item
Mule divides Big5 into two sub-character-sets:
	0xa140 - 0xc67e (Level 1)
	0xc6a1 - 0xfefe (Level 2)
and allocates two leading-chars lc-big5-1 and lc-big5-2 to
them.  (See character.txt)

@item
Usually, each leading-char (or character-set) has unique
character category.  But lc-big5-1 and lc-big5-2 has the
same character category of mnemonic 't'.  So, regular
expression "\\ct" matches any Big5 (Level 1 and Level 2)
characters.  (See syntax.txt)

@item
If you specify ISO2022 type coding-system on output,
Mule converts Big5 code using unofficial final-characters
'0' (for Level 1) and '1' (for Level 2).

@item
You can use either fonts of ETen or HKU for displaying
Big5 code.  Mule judges which font is used by examining
existence of character whose code point is 0xC6A1.  If it
exists, the font is HKU, else the fonts is ETen.
@end enumerate

@node Syntax
@section Syntax and Category of character

@subsection Syntax

Mule can define syntax of all multi-byte characters by
@code{modify-syntax-entry}.

The first argument of @code{modify-syntax-entry should} be one of below:
@enumerate
@item
ASCII character
@item
multi-byte character
@item
leading character of multi-byte character
@item
partially defined characters returned by:

@quotation
@code{(make-character leading-char arg)}
@end quotation
@end enumerate

There's a restriction of specifying matching character within 
second argument.  If the first argument specifies multi-byte
character or leading char of multi-byte character, the
matching character should have the same leading character.  If
the character is 2-byte code, the first-byte of it should
also be the same with the first-byte of first argument.

@subsection Category

Like syntax, category also defines characteristics of
characters.  The differences are:
@enumerate
@item
Each Character can have more than one category.
@item
User can define new type of category as he wishes.
	Example: See japanese.el
@item
@code{char-category} returns all mnemonics of the character by string.
@item
For regular expression search, you can use the \cm or \Cm (any mnemonics
comes at the place of 'm') instead of \sm and \Sm.
@end enumerate

@node Font
@section Font

FONTSET is a set of fonts which have the same height and style.  A
fontset should hopefully contain enough fonts to display a character of
various character sets.

Mule uses fontset instead of font.  You can specify fontset at any place
where you can specify font.  You can still specify font, in which case,
a fontset which include the font is searched and used.

Like font, fontset is also a string specifying the name.

@menu
* Initial fontsets::	Fontsets which Mule have at startup time.
* Specify fontset::     How to specify a fontset?
* Manage fontset::      How to create or modify a fontset?
@end menu

@node Initial fontsets
@subsection Initial fontsets

@subsubsection "default-fontset"

Mule automatically creates a fontset named "default-fontset" at startup
time.  Each font in this fontset is specifed by a very generic name such
as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and
"-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji).
These values are defined in @file{lisp/term/x-win.el}.

If there's no other fontsets specifed by X's resource, "default-fontset"
is used for the first frame of Mule.

In most cases, this is enough.  You probably don't have to have any
other fontsets.

@subsubsection  X's resourse

Mule also creates fontsets specified in X's resource "fontSetList (class
FontSetList)".  The value is a comma separated list of fontset names.

@example
*FontSetList: 16,24
@end example

The actual contents of each fontset is specified by "fontSet-xxx (class
FontSet-xxx)" where "xxx" is a name of the corresponding fontset.  The
value of this resource is a comma separated list of font names.

@example
*FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1
@end example

Each font name should not contain wild card `*' or `?' in
CHARSET_REGSTRY field because a character set for this font is
recognized by this field.  This means that you don't have to care about
the order of font names.

For instance,

@example
*FontSet-16:\
        -etl-fixed-medium-r-*--16-*-iso8859-1\
	-ming-fixed-medium-r-*--*-*-jisx0208.1983-*
@end example

is enough to tell Mule that the fontset "16" contains ASCII font and
JISX0208 font.  Please note that the second name has only wild card in
PIXEL_SIZE field.  Since Mule try to open a font of the same PIXEL_SIZE
as ASCII font of the same fontset, you'ld better not specify actual
value in PIXEL_SIZE field except for ASCII font.

As for fonts not listed in the specification of fontset, corresponding
font names in "default fontset" is used.

The first fontset in FontSetList is used for the first frame of Mule.
If you want to use "default-fontset" while specifying other fontsets in
the resource, please put "default-fontset" at the first of the value.

@example
*FontSetList: default-fontset,16,24
@end example

In this case, you don't have to have the resource
"FontSet-default-fontset".

@node Specify fontset
@subsection How to specify a fontset?

You can specify fontset at any place where you can sepcify font.

To change the fontset used for the first frame of Mule:

@enumerate
@item
command line arguments "-fn xxx" or "-font xxx"

If this argument exits, fontset is searched in the following order:
@enumerate
@item
A fontset whose name is "xxx".
@item
A fontset which contains ASCII font "xxx".
@item
Create a new fontset "xxx" which contains ASCII font "xxx".
@end enumerate

@item
In your ~/.emacs,

@example
(setcdr (assoc 'font default-frame-alist) "xxx")
@end example

@end enumerate

To change a fontset after Mule started:

@enumerate
@item
By the command

@example
M-x set-default-fontset<CR>xxx<CR>
@end example

@item
By @key{Ctl-Mouse-3}

@end enumerate

@node Manage fontset
@subsection How to create or modify a fontset?

You can create a new fontset by `new-fontset' and modify an
existing fontset by `set-fontset-font'.

You can get a list of fontset currently created by
`fonset-list'.

You can check if a fontset is already created or not by
`fonsetp'.