Source

semantic / doc / lang-support-guide.texi

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
\input texinfo  @c -*-texinfo-*-
@c %**start of header
@setfilename semantic-langdev.info
@set TITLE  Language Support Developer's Guide
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}

@c *************************************************************************
@c @ Header
@c *************************************************************************

@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp

@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header

@copying
This manual documents Application Development with Semantic.

Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007 Eric M. Ludlam
Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce
Copyright @copyright{} 2002, 2003 Richard Y. Kim

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being list their titles, with the Front-Cover Texts
being list, and with the Back-Cover Texts being list.  A copy of the
license is included in the section entitled ``GNU Free Documentation
License''.
@end quotation
@end copying

@ifinfo
@dircategory Emacs
@direntry
* Semantic Language Writer's guide: (semantic-langdev).
@end direntry
@end ifinfo

@iftex
@finalout
@end iftex

@c @setchapternewpage odd
@c @setchapternewpage off

@ifinfo
This file documents Language Support Development with Semantic.
@emph{Infrastructure for parser based text analysis in Emacs}

Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
@end ifinfo

@titlepage
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@vskip 0pt plus 1 fill
Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
@page
@vskip 0pt plus 1 fill
@insertcopying
@end titlepage
@page

@c MACRO inclusion
@include semanticheader.texi


@c *************************************************************************
@c @ Document
@c *************************************************************************
@contents

@node top
@top @value{TITLE}

Semantic is bundled with support for several languages such as
C, C++, Java, Python, etc.
However one of the primary gols of semantic is to provide a framework
in which anyone can add support for other languages easily.
In order to support a new lanaugage, one typically has to provide a
lexer and a parser along with appropriate semantic actions that
produce the end result of the parser - the semantic tags.

This chapter first discusses the semantic tag data structure to
familiarize the reader to the goal.  Then all the components necessary
for supporting a lanaugage is discussed starting with the writing
lexer, writing the parser, writing semantic rules, etc.
Finally several parsers bundled with semantic are discussed as case
studies.

@menu
* Tag Structure::               
* Language Support Overview::   
* Writing Lexers::              
* Writing Parsers::             
* Parsing a language file::     
* Debugging::                   
* Parser Error Handling::       
* GNU Free Documentation License::  
* Index::                       
@end menu

@node Tag Structure
@chapter Tag Structure
@cindex Tag Structure

The end result of the parser for a buffer is a list of @i{tags}.
Currently each tag is a list with up to five elements:
@example
("NAME" CLASS ATTRIBUTES PROPERTIES OVERLAY)
@end example

@var{CLASS} represents what kind of tag this is.  Common @var{CLASS}
values include @code{variable}, @code{function}, or @code{type}.
@inforef{Tag Basics, , semantic-appdev.info}.

@var{ATTRIBUTES} is a slot filled with langauge specific options for
the tag.  Function arguments, return type, and other flags all are
stored in attributes.  A language author fills in the ATTRIBUTES with
the tag constructor, which is parser style dependant.

@var{PROPERTIES} is a slot generated by the semantic parser harness,
and need not be provided by a language author.  Programmatically
access tag properties with @code{semantic--tag-put-property},
@code{semantic--tag-put-property-no-side-effect} and
@code{semantic--tag-get-property}.

@var{OVERLAY} represents positional information for this tag.  It is
automatically generated by the semantic parser harness, and need not
be provided by the language author, unless they provide a tag
expansion function via @code{semantic-tag-expand-function}.

The @var{OVERLAY} property is accessed via several functions returning
the beginning, end, and buffer of a token.  Use these functions unless
the overlay is really needed (see @inforef{Tag Query, , app-dev-guide}).
Depending on the
overlay in a program can be dangerous because sometimes the overlay is
replaced with an integer pair
@example
[ START END ]
@end example
when the buffer the tag belongs to is not in memory.  This happens
when a user has activated the Semantic Database 
@inforef{semanticdb, , semantic-appdev}.

To create tags for a functional or object oriented language, you can
use s series of tag creation functions.  @inforef{Creating Tags, , semantic-appdev}

@node Language Support Overview
@chapter Language Support Overview
@cindex Language Support Overview

Starting with version 2.0, @semantic{} provides many ways to add support
for a language into the @semantic{} framework.

The primary means to customize how @semantic{} works is to implement
language specific versions of @i{overloadable} functions.  Semantic
has a specialized mode bound way to do this.
@ref{Semantic Overload Mechanism}.

The parser has several parts which are all also overloadable.  The
primary entry point into the parser is
@code{semantic-fetch-tags} which calls
@code{semantic-parse-region} which returns a list of semantic tags
which get set to @code{semantic--buffer-cache}.

@code{semantic-parse-region} is the first ``overloadable'' function.
The default behavior of this is to simply call @code{semantic-lex},
then pass the lexical token list to
@code{semantic-repeat-parse-whole-stream}.  At each stage, another
more focused layer provides a means of overloading.

The parser is not the only layer that provides overloadable methods.
Application APIs @inforef{top, , semantic-appdev} provide many
overload functions as well.

@menu
* Semantic Overload Mechanism::  
* Semantic Parser Structure::   
* Application API Structure::   
@end menu

@node Semantic Overload Mechanism
@section Semantic Overload Mechanism

one of @semantic{}'s goals is to provide a framework for supporting a
wide range of languages.
writing parsers for some languages are very simple, e.g.,
any dialect of lisp family such as emacs-lisp and scheme.
parsers for many languages can be written in context free grammars
such as c, java, python, etc.
on the other hand, it is impossible to specify context free grammars
for other languages such as texinfo.
Yet @semantic{} already provides parsers for all these languages.

In order to support such wide range of languages,
a mechanism for customizing the parser engine at many levels
was needed to maximize the code reuse yet give each programmer
the flexibility of customizing the parser engine at many levels
of granularity.
@cindex function overloading
@cindex overloading, function
The solution that @semantic{} provides is the
@i{function overloading} mechanism which
allows one to intercept and customize the behavior
of many of the functions in the parser engine.
First the parser engine breaks down the task of parsing a language into
several steps.
Each step is represented by an Emacs-Lisp function.
Some of these are
@code{semantic-parse-region},
@code{semantic-lex},
@code{semantic-parse-stream},
@code{semantic-parse-changes},
etc.

Many built-in @semantic{} functions are declared
as being @i{over-loadable} functions, i.e., functions that do
reasonable things for most languages, but can be
customized to suit the particular needs of a given language.
All @i{over-loadable} functions then can easily be @i{over-ridden}
if necessary.
The rest of this section provides details on this @i{overloading mechanism}.

Over-loadable functions are created by defining functions
with the @code{define-overload} macro rather than the usual @code{defun}.
@code{define-overload} is a thin wrapper around @code{defun}
that sets up the function so that it can be overloaded.
An @i{over-loadable} function then can be @i{over-ridden}
in one of two ways:
@code{define-mode-overload-implementation}
and
@code{semantic-install-function-overrides}.

Let's look at a couple of examples.
@code{semantic-parse-region} is one of the top level functions
in the parser engine defined via @code{define-overload}:

@example
(define-overload semantic-parse-region
  (start end &optional nonterminal depth returnonerror)
  "Parse the area between START and END, and return any tokens found.

...
  
tokens.")
@end example

The document string was truncated in the middle above since it is not
relevant here.
The macro invocation above defines the @code{semantic-parse-region}
Emacs-Lisp function that checks first if there is an overloaded
implementation.
If one is found, then that is called.
If a mode specific implementation is not found, then the default
implementation is called which in this case is to call
@code{semantic-parse-region-default}, i.e.,
a function with the same name but with the tailing @i{-default}.
That function needs to be written separately and take the same
arguments as the entry created with @code{define-overload}.

One way to overload @code{semantic-parse-region} is via
@code{semantic-install-function-overrides}.
An example from @file{semantic-texi.el} file is shown below:

@example
(defun semantic-default-texi-setup ()
  "Set up a buffer for parsing of Texinfo files."
  ;; This will use our parser.
  (semantic-install-function-overrides
   '((parse-region . semantic-texi-parse-region)
     (parse-changes . semantic-texi-parse-changes)))
  ...
  )

(add-hook 'texinfo-mode-hook 'semantic-default-texi-setup)
@end example

Above function is called whenever a buffer is setup as texinfo mode.
@code{semantic-install-function-overrides} above indicates that
@code{semantic-texi-parse-region} is to over-ride the default
implementation of @code{semantic-parse-region}.
Note the use of @code{parse-region} symbol which is
@code{semantic-parse-region} without the leading @i{semantic-} prefix.

Another way to over-ride a built-in @semantic{} function is via
@code{define-mode-overload-implementation}.
An example from @file{wisent-python.el} file is shown below.

@example
(define-mode-overload-implementation
  semantic-parse-region python-mode
  (start end &optional nonterminal depth returnonerror)
  "Over-ride in order to initialize some variables."
  (let ((wisent-python-lexer-indent-stack '(0))
        (wisent-python-explicit-line-continuation nil))
    (semantic-parse-region-default
     start end nonterminal depth returnonerror)))
@end example

Above over-rides @code{semantic-parse-region} so that for 
buffers whose major mode is @code{python-mode},
the code specified above is executed rather than the
default implementation.

@subsection Why not to use advice

One may wonder why @semantic defines an overload mechanism when
Emacs already has advice.  @xref{(elisp)Advising Functions}.

Advising is generally considered a mechanism of last resort when
modifying or hooking into an existing package without modifying
that sourse file.  Overload files advertise that they @i{should} be
overloaded, and define syntactic sugar to do so.

@node Semantic Parser Structure
@section Semantic Parser Structure

NOTE: describe the functions that do parsing, and how to overload each.

@ignore
semantic-fetch-tags is the top level function that parses the current buffer.
  semantic-parse-changes
    semantic-parse-changes-default
      semantic-edits-incremental-parser
  semantic-parse-region (overloadable)
    semantic-parse-region-default
      semantic-lex (overloadable)
        *semantic-lex-analyzer
          semantic-flex
      semantic-repeat-parse-whole-stream
        semantic-parse-stream (overloadable)
          semantic-parse-stream-default
            semantic-bovinate-stream (bovine)
          wisent-parse-stream      (wisent)
    semantic-texi-parse-region
@end ignore

@example

@ignore
semantic-post-change-major-mode-function
semantic-parser-name

semantic-toplevel-bovine-table (see semantic-active-p)
  semantic-bovinate-stream
    semantic-toplevel-bovine-table
      semantic-parse-region
        semantic-parse-region-default
          semantic-lex
          semantic-repeat-parse-whole-stream

semantic-init-db-hooks semanticdb-semantic-init-hook-fcn
semantic-init-hooks semantic-auto-parse-mode

semantic-flex-keywords-obarray (see semantic-bnf-keyword-table)
  Used by semantic-lex-keyword-symbol, semantic-lex-keyword-set,
  semantic-lex-map-keywords, semantic-flex
semantic-lex-types-obarray

* To support a new language, one must write a set of Emacs-Lisp
  functions that converts any valid text written in that language
  into a list of semantic tokens.  Typically this task is divided into two
  areas: a lexer and a parser.
* There are many ways of doing this.  However in almost all cases, two
* Parser converts
wisent parsers
bovine parsers
custom parsers
@end ignore


@end example

@node Application API Structure
@section Application API Structure

NOTE: improve this:

How to program with the Application programming API into the data
structures created by @semantic are in the Application development
guide. Read that guide to get a feel for the specifics of what you can
customize. @inforef{top, , semantic-appdev}

Here are a list of applications, and the specific APIs that you will
need to overload to make them work properly with your language.

@table @code
@item imenu
@itemx speedbar
@itemx ecb
These tools requires that the @code{semantic-format} methods create
correct strings.
@inforef{Format Tag, ,semantic-addpev}
@item semantic-analyze
The analysis tool requires that the @code{semanticdb} tool is active,
and that the searching methods are overloaded.  In addition,
@code{semanticdb} system database could be written to provide symbols
from the global environment of your langauge.
@inforef{System Databases, , semantic-appdev}

In addition, the analyzer requires that the @code{semantic-ctxt}
methods are overloaded.  These methods allow the analyzer to look at
the context of the cursor in your language, and predict the type of
location of the cursor. @inforef{Derived Context, , semantic-appdev}.
@item semantic-idle-summary-mode
@itemx semantic-idle-completions-mode
These tools use the semantic analysis tool.
@inforef{Context Analysis. . semantic-appdev}
@end table

@menu
* Semantic Analyzer Support::   
@end menu

@node Semantic Analyzer Support
@subsection Semantic Analyzer Support

@ignore
>> From and Email I sent to get David started on supporting the analyzer

  First, context parsing needs to work.  This includes
`semantic-ctxt-current-symbol', `-function', `-assignment'.  You also
need `semantic-get-local-arguments' and -local-variables'.

  The next most critical piece is to provide implementations of the
semanticdb-find search path calculation API.
`semanticdb-find-table-for-include' is a good start.  That really
should use `semantic-dependency-tag-file', but that doesn't use
semanticdb-project-root when looking for files.  Java could be
trouble here since you can import a *.

  A couple more good ones is `semanticdb-find-translate-path-brutish'
and `semanticdb-find-translate-path-includes'.  Brutish searches look
at everything in the current project.  The include path will scan
only those items explicitly included into your file.

  Last but not least, for Java, we need a semanticdb back end that
will provide tags out of a jar file.  Since most objects inherit from
a system library (like Object), you will need this to get the tag
list including `clone' and the like.
@end ignore

@node Writing Lexers
@chapter Writing Lexers
@cindex Writing Lexers

@ignore
Are we going to support semantic-flex as well as the new lexer?

Not in the doc - Eric
@end ignore

In order to reduce a source file into a tag table, it must first be
converted into a token stream.  Tokens are syntactic elements such as
whitespace, symbols, strings, lists, and punctuation.

The lexer uses the major-mode's syntax table for conversion.
@xref{Syntax Tables,,,elisp}.
As long as that is set up correctly (along with the important
@code{comment-start} and @code{comment-start-skip} variable) the lexer
should already work for your language.

The primary entry point of the lexer is the @dfn{semantic-lex} function
shown below.
Normally, you do not need to call this function.
It is usually called by @emph{semantic-fetch-tags} for you.

@anchor{semantic-lex}
@defun semantic-lex start end &optional depth length
Lexically analyze text in the current buffer between @var{START} and @var{END}.
Optional argument @var{DEPTH} indicates at what level to scan over entire
lists.  The last argument, @var{LENGTH} specifies that @dfn{semantic-lex}
should only return @var{LENGTH} tokens.  The return value is a token stream.
Each element is a list, such of the form
  (symbol start-expression .  end-expression)
where @var{SYMBOL} denotes the token type.
See @code{semantic-lex-tokens} variable for details on token types.  @var{END}
does not mark the end of the text scanned, only the end of the
beginning of text scanned.  Thus, if a string extends past @var{END}, the
end of the return token will be larger than @var{END}.  To truly restrict
scanning, use @dfn{narrow-to-region}.
@end defun

@menu
* Lexer Overview::              What is a Lexer?
* Lexer Output::                Output of a Lexical Analyzer
* Lexer Construction::          Constructing your own lexer
* Lexer Built In Analyzers::    Built in analyzers you can use
* Lexer Analyzer Construction::  Constructing your own anlyzers
* Keywords::                    Specialized lexical tokens.
* Keyword Properties::          
@end menu

@node Lexer Overview
@section Lexer Overview

@semantic lexer breaks up the content of an Emacs buffer into a stream of
tokens.  This process is based mostly on regular expressions which in
turn depend on the syntax table of the buffer's major mode being setup
properly.
@xref{Major Modes,,,emacs}.
@xref{Syntax Tables,,,elisp}.
@xref{Regexps,,,emacs}.

The top level lexical function @dfn{semantic-lex}, calls the function
stored in @dfn{semantic-lex-analyzer}.  The default value is the
function @dfn{semantic-flex} from version 1.4 of Semantic.  This will
eventually be depricated.

In the default lexer, the following regular expressions which rely on syntax
tables are used:

@table @code
@item @code{\\s-}
whitespace characters
@item @code{\\sw}
word constituent
@item @code{\\s_}
symbol constituent
@item @code{\\s.}
punctuation character
@item @code{\\s<}
comment starter
@item @code{\\s>}
comment ender
@item @code{\\s\\}
escape character
@item @code{\\s)}
close parenthesis character
@item @code{\\s$}
paired delimiter
@item @code{\\s\"}
string quote
@item @code{\\s\'}
expression prefix
@end table

In addition, Emacs' built-in features such as
@code{comment-start-skip},
@code{forward-comment},
@code{forward-list},
and
@code{forward-sexp}
are employed.

@node Lexer Output
@section Lexer Output

The lexer, @ref{semantic-lex}, scans the content of a buffer and
returns a token list.
Let's illustrate this using this simple example.

@example
00: /*
01:  * Simple program to demonstrate semantic.
02:  */
03:
04: #include <stdio.h>
05:
06: int i_1;
07:
08: int
09: main(int argc, char** argv)
10: @{
11:     printf("Hello world.\n");
12: @}
@end example

Evaluating @code{(semantic-lex (point-min) (point-max))}
within the buffer with the code above returns the following token list.
The input line and string that produced each token is shown after
each semi-colon.

@example
((punctuation     52 .  53)     ; 04: #
 (INCLUDE         53 .  60)     ; 04: include
 (punctuation     61 .  62)     ; 04: <
 (symbol          62 .  67)     ; 04: stdio
 (punctuation     67 .  68)     ; 04: .
 (symbol          68 .  69)     ; 04: h
 (punctuation     69 .  70)     ; 04: >
 (INT             72 .  75)     ; 06: int
 (symbol          76 .  79)     ; 06: i_1
 (punctuation     79 .  80)     ; 06: ;
 (INT             82 .  85)     ; 08: int
 (symbol          86 .  90)     ; 08: main
 (semantic-list   90 . 113)     ; 08: (int argc, char** argv)
 (semantic-list  114 . 147)     ; 09-12: body of main function
 )
@end example

As shown above, the token list is a list of ``tokens''.
Each token in turn is a list of the form

@example
(TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION)
@end example

@noindent
where TOKEN-TYPE is a symbol, and the other two are integers indicating
the buffer position that delimit the token such that

@lisp
(buffer-substring BEGINNING-POSITION ENDING-POSITION)
@end lisp

@noindent
would return the string form of the token.

Note that one line (line 4 above) can produce seven tokens while
the whole body of the function produces a single token.
This is because the @var{depth} parameter of @code{semantic-lex} was
not specified.
Let's see the output when @var{depth} is set to 1.
Evaluate @code{(semantic-lex (point-min) (point-max) 1)} in the same buffer.
Note the third argument of @code{1}.

@example
((punctuation    52 .  53)     ; 04: #
 (INCLUDE        53 .  60)     ; 04: include
 (punctuation    61 .  62)     ; 04: <
 (symbol         62 .  67)     ; 04: stdio
 (punctuation    67 .  68)     ; 04: .
 (symbol         68 .  69)     ; 04: h
 (punctuation    69 .  70)     ; 04: >
 (INT            72 .  75)     ; 06: int
 (symbol         76 .  79)     ; 06: i_1
 (punctuation    79 .  80)     ; 06: ;
 (INT            82 .  85)     ; 08: int
 (symbol         86 .  90)     ; 08: main

 (open-paren     90 .  91)     ; 08: (
 (INT            91 .  94)     ; 08: int
 (symbol         95 .  99)     ; 08: argc
 (punctuation    99 . 100)     ; 08: ,
 (CHAR          101 . 105)     ; 08: char
 (punctuation   105 . 106)     ; 08: *
 (punctuation   106 . 107)     ; 08: *
 (symbol        108 . 112)     ; 08: argv
 (close-paren   112 . 113)     ; 08: )

 (open-paren    114 . 115)     ; 10: @{
 (symbol        120 . 126)     ; 11: printf
 (semantic-list 126 . 144)     ; 11: ("Hello world.\n")
 (punctuation   144 . 145)     ; 11: ;
 (close-paren   146 . 147)     ; 12: @}
 )
@end example

The @var{depth} parameter ``peeled away'' one more level of ``list''
delimited by matching parenthesis or braces.
The depth parameter can be specified to be any number.
However, the parser needs to be able to handle the extra tokens.

This is an interesting benefit of the lexer having the full
resources of Emacs at its disposal.
Skipping over matched parenthesis is achieved by simply calling
the built-in functions @code{forward-list} and @code{forward-sexp}.

@node Lexer Construction
@section Lexer Construction

While using the default lexer is certainly an option, particularly
for grammars written in semantic 1.4 style, it is usually more
efficient to create a custom lexer for your language.

You can create a new lexer with @dfn{define-lex}.

@defun define-lex name doc &rest analyzers
@anchor{define-lex}
Create a new lexical analyzer with @var{NAME}.
@var{DOC} is a documentation string describing this analyzer.
@var{ANALYZERS} are small code snippets of analyzers to use when
building the new @var{NAMED} analyzer.  Only use analyzers which
are written to be used in @dfn{define-lex}.
Each analyzer should be an analyzer created with @dfn{define-lex-analyzer}.
Note: The order in which analyzers are listed is important.
If two analyzers can match the same text, it is important to order the
analyzers so that the one you want to match first occurs first.  For
example, it is good to put a numbe analyzer in front of a symbol
analyzer which might mistake a number for as a symbol.
@end defun

The list of @var{analyzers}, needed here can consist of one of
several built in analyzers, or one of your own construction.  The
built in analyzers are:

@node Lexer Built In Analyzers
@section Lexer Built In Analyzers

@defspec semantic-lex-default-action
The default action when no other lexical actions match text.
This action will just throw an error.
@end defspec

@defspec semantic-lex-beginning-of-line
Detect and create a beginning of line token (BOL).
@end defspec

@defspec semantic-lex-newline
Detect and create newline tokens.
@end defspec

@defspec semantic-lex-newline-as-whitespace
Detect and create newline tokens.
Use this ONLY if newlines are not whitespace characters (such as when
they are comment end characters) AND when you want whitespace tokens.
@end defspec

@defspec semantic-lex-ignore-newline
Detect and create newline tokens.
Use this ONLY if newlines are not whitespace characters (such as when
they are comment end characters).
@end defspec

@defspec semantic-lex-whitespace
Detect and create whitespace tokens.
@end defspec

@defspec semantic-lex-ignore-whitespace
Detect and skip over whitespace tokens.
@end defspec

@defspec semantic-lex-number
Detect and create number tokens.
Number tokens are matched via this variable:

@defvar semantic-lex-number-expression
Regular expression for matching a number.
If this value is @code{nil}, no number extraction is done during lex.
This expression tries to match C and Java like numbers.

@example
DECIMAL_LITERAL:
    [1-9][0-9]*
  ;
HEX_LITERAL:
    0[xX][0-9a-fA-F]+
  ;
OCTAL_LITERAL:
    0[0-7]*
  ;
INTEGER_LITERAL:
    <DECIMAL_LITERAL>[lL]?
  | <HEX_LITERAL>[lL]?
  | <OCTAL_LITERAL>[lL]?
  ;
EXPONENT:
    [eE][+-]?[09]+
  ;
FLOATING_POINT_LITERAL:
    [0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
  | [.][0-9]+<EXPONENT>?[fFdD]?
  | [0-9]+<EXPONENT>[fFdD]?
  | [0-9]+<EXPONENT>?[fFdD]
  ;
@end example
@end defvar

@end defspec

@defspec semantic-lex-symbol-or-keyword
Detect and create symbol and keyword tokens.
@end defspec

@defspec semantic-lex-charquote
Detect and create charquote tokens.
@end defspec

@defspec semantic-lex-punctuation
Detect and create punctuation tokens.
@end defspec

@defspec semantic-lex-punctuation-type
Detect and create a punctuation type token.
Recognized punctuations are defined in the current table of lexical
types, as the value of the `punctuation' token type.
@end defspec

@defspec semantic-lex-paren-or-list
Detect open parenthesis.
Return either a paren token or a semantic list token depending on
`semantic-lex-current-depth'.
@end defspec

@defspec semantic-lex-open-paren
Detect and create an open parenthisis token.
@end defspec

@defspec semantic-lex-close-paren
Detect and create a close paren token.
@end defspec

@defspec semantic-lex-string
Detect and create a string token.
@end defspec

@defspec semantic-lex-comments
Detect and create a comment token.
@end defspec

@defspec semantic-lex-comments-as-whitespace
Detect comments and create a whitespace token.
@end defspec

@defspec semantic-lex-ignore-comments
Detect and create a comment token.
@end defspec

@node Lexer Analyzer Construction
@section Lexer Analyzer Construction

Each of the previous built in analyzers are constructed using a set
of analyzer construction macros.  The root construction macro is:

@defun define-lex-analyzer name doc condition &rest forms
Create a single lexical analyzer @var{NAME} with @var{DOC}.
When an analyzer is called, the current buffer and point are
positioned in a buffer at the location to be analyzed.
@var{CONDITION} is an expression which returns @code{t} if @var{FORMS} should be run.
Within the bounds of @var{CONDITION} and @var{FORMS}, the use of backquote
can be used to evaluate expressions at compile time.
While forms are running, the following variables will be locally bound:
  @code{semantic-lex-analysis-bounds} - The bounds of the current analysis.
                  of the form (@var{START} . @var{END})
  @code{semantic-lex-maximum-depth} - The maximum depth of semantic-list
                  for the current analysis.
  @code{semantic-lex-current-depth} - The current depth of @code{semantic-list} that has
                  been decended.
  @code{semantic-lex-end-point} - End Point after match.
                   Analyzers should set this to a buffer location if their
                   match string does not represent the end of the matched text.
  @code{semantic-lex-token-stream} - The token list being collected.
                   Add new lexical tokens to this list.
Proper action in @var{FORMS} is to move the value of @code{semantic-lex-end-point} to
after the location of the analyzed entry, and to add any discovered tokens
at the beginning of @code{semantic-lex-token-stream}.
This can be done by using @dfn{semantic-lex-push-token}.
@end defun

Additionally, a simple regular expression based analyzer can be built
with:

@defun define-lex-regex-analyzer name doc regexp &rest forms
Create a lexical analyzer with @var{NAME} and @var{DOC} that will match @var{REGEXP}.
@var{FORMS} are evaluated upon a successful match.
See @dfn{define-lex-analyzer} for more about analyzers.
@end defun

@defun define-lex-simple-regex-analyzer name doc regexp toksym &optional index &rest forms
Create a lexical analyzer with @var{NAME} and @var{DOC} that match @var{REGEXP}.
@var{TOKSYM} is the symbol to use when creating a semantic lexical token.
@var{INDEX} is the index into the match that defines the bounds of the token.
Index should be a plain integer, and not specified in the macro as an
expression.
@var{FORMS} are evaluated upon a successful match @var{BEFORE} the new token is
created.  It is valid to ignore @var{FORMS}.
See @dfn{define-lex-analyzer} for more about analyzers.
@end defun

Regular expression analyzers are the simplest to create and manage.
Often, a majority of your lexer can be built this way.  The analyzer
for matching punctuation looks like this:

@example
(define-lex-simple-regex-analyzer semantic-lex-punctuation
  "Detect and create punctuation tokens."
  "\\(\\s.\\|\\s$\\|\\s'\\)" 'punctuation)
@end example

More complex analyzers for matching larger units of text to optimize
the speed of parsing and analysis is done by matching blocks.

@defun define-lex-block-analyzer name doc spec1 &rest specs
Create a lexical analyzer @var{NAME} for paired delimiters blocks.
It detects a paired delimiters block or the corresponding open or
close delimiter depending on the value of the variable
@code{semantic-lex-current-depth}.  @var{DOC} is the documentation string of the lexical
analyzer.  @var{SPEC1} and @var{SPECS} specify the token symbols and open, close
delimiters used.  Each @var{SPEC} has the form:

(@var{BLOCK-SYM} (@var{OPEN-DELIM} @var{OPEN-SYM}) (@var{CLOSE-DELIM} @var{CLOSE-SYM}))

where @var{BLOCK-SYM} is the symbol returned in a block token.  @var{OPEN-DELIM}
and @var{CLOSE-DELIM} are respectively the open and close delimiters
identifying a block.  @var{OPEN-SYM} and @var{CLOSE-SYM} are respectively the
symbols returned in open and close tokens.
@end defun

These blocks is what makes the speed of semantic's Emacs Lisp based
parsers fast.  For exmaple, by defining all text inside @{ braces @} as
a block the parser does not need to know the contents of those braces
while parsing, and can skip them all together.

@node Keywords
@section Keywords

Another important piece of the lexer is the keyword table (see
@ref{Writing Parsers}).  You language will want to set up a keyword table for
fast conversion of symbol strings to language terminals.

The keywords table can also be used to store additional information
about those keywords.  The following programming functions can be useful
when examining text in a language buffer.

@defun semantic-lex-keyword-p name
Return non-@code{nil} if a keyword with @var{NAME} exists in the keyword table.
Return @code{nil} otherwise.
@end defun

@defun semantic-lex-keyword-put name property value
For keyword with @var{NAME}, set its @var{PROPERTY} to @var{VALUE}.
@end defun

@defun semantic-lex-keyword-get name property
For keyword with @var{NAME}, return its @var{PROPERTY} value.
@end defun

@defun semantic-lex-map-keywords fun &optional property
Call function @var{FUN} on every semantic keyword.
If optional @var{PROPERTY} is non-@code{nil}, call @var{FUN} only on every keyword which
as a @var{PROPERTY} value.  @var{FUN} receives a semantic keyword as argument.
@end defun

@defun semantic-lex-keywords &optional property
Return a list of semantic keywords.
If optional @var{PROPERTY} is non-@code{nil}, return only keywords which have a
@var{PROPERTY} set.
@end defun

Keyword properties can be set up in a grammar file for ease of maintenance.
While examining the text in a language buffer, this can provide an easy
and quick way of storing details about text in the buffer.

@node Keyword Properties
@section Standard Keyword Properties

Keywords in a language can have multiple properties.  These
properties can be used to associate the string that is the keyword
with additional information.

Currently available properties are:

@table @b
@item summary
The summary property is used by semantic-summary-mode as a help
string for the keyword specified.
@end table

Notes:

Possible future properties.  This is just me musing:

@table @b
@item face
Face used for highlighting this keyword, differentiating it from the
keyword face.
@item template
@itemx skeleton
Some sort of tempo/skel template for inserting the programatic
structure associated with this keyword.
@item abbrev
As with template.
@item action
@itemx menu
Perhaps the keyword is clickable and some action would be useful.
@end table


@node Writing Parsers
@chapter Writing Parsers
@cindex Writing Parsers

@ignore
For the parser developer, I can think of two extra sections.  One for
semanticdb extensions,  (If a system database is needed.)  A second
for the `semantic-ctxt' extensions.  Many of the most interesting
tools will completely fail to work without local context parsing
support.

Perhaps even a section on foreign tokens.  For example, putting a
Java token into a C++ file could auto-gen a native method, just as
putting a token into a Texinfo file converts it into documentation.

In addition, in the "writing grammars" section should have
subsections as listed in the examples of the overview section.  It
might be useful to have a fourth section describing the similarities
between the two file types (by and wy) and how to use the grammar
mode.  (I'm not sure if that should be covered elsewhere.)
@end ignore

When converting a source file into a tag table it is important to
specify rules to accomplish this.  The rules are stored in the buffer
local variable @code{semantic--buffer-cache}.

While it is certainly possible to write this table yourself, it is
most likely you will want to use the @ref{Grammar Programming
Environment}.

There are three choices for parsing your language.

@table @b
@item Bovine Parser
The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser.  For more information,
@inforef{top, the Bovine Parser Manual, bovine}.

@item Wisent Parser
The @dfn{wisent} parser is a port of the GNU Compiler Compiler Bison
to Emacs Lisp.  Wisent includes the iterative error handler of the
bovine parser, and has the same error correction as traditional
@acronym{LALR} parsers.  For more information,
@inforef{top, the Wisent Parser Manual, wisent}.

@item External Parser
External parsers, such as the texinfo parser can be implemented using
any means.  This allows the use of a regular expression parser for
non-regular languages, or external programs for speed.
@end table

@menu
* External Parsers::            Writing an external parser
* Grammar Programming Environment::  Using the grammar writing environemt
* Parser Backend Support::             Lisp needed to support a grammar.
@end menu

@node External Parsers
@section External Parsers

The texinfo parser in @file{semantic-texi.el} is an example of an
external parser.  To make your parser work, you need to have a setup
function.

Note: Finish this.

@node Grammar Programming Environment
@section Grammar Programming Environment

Semantic grammar files in @file{.by} or @file{.wy} format have their
own programming mode.  This mode provides indentation and coloring
services in those languages.  In addition, the grammar languages are
also supported by @semantic so tagging information is available to
tools such as imenu or speedbar.

For more information,
@inforef{top, the Grammar Framework Manual, grammar-fw}.

@node Parsing a language file
@chapter Parsing a language file

The best way to call the parser from programs is via
@code{semantic-fetch-tags}.  This, in turn, uses other internal
@acronym{API} functions which plug-in parsers can take advantage of.

@defun semantic-fetch-tags
@anchor{semantic-fetch-tags}
Fetch semantic tags from the current buffer.
If the buffer cache is up to date, return that.
If the buffer cache is out of date, attempt an incremental reparse.
If the buffer has not been parsed before, or if the incremental reparse
fails, then parse the entire buffer.
If a lexcial error had been previously discovered and the buffer
was marked unparseable, then do nothing, and return the cache.
@end defun

Another approach is to let Emacs call the parser on idle time, when
needed, then use @code{semantic-fetch-available-tags} to retrieve and
process only the available tags, provided that the
@code{semantic-after-*-hook} hooks have been setup to synchronize with
new tags when they become available.

@defun semantic-fetch-available-tags
@anchor{semantic-fetch-available-tags}
Fetch available semantic tags from the current buffer.
That is, return tags currently in the cache without parsing the
current buffer.

Parse operations happen asynchronously when needed on Emacs idle time.
Use the @code{semantic-after-toplevel-cache-change-hook} and
@code{semantic-after-partial-cache-change-hook} hooks to synchronize with
new tags when they become available.
@end defun

@deffn Command semantic-clear-toplevel-cache
@anchor{semantic-clear-toplevel-cache}
Clear the toplevel tag cache for the current buffer.
Clearing the cache will force a complete reparse next time a token
stream is requested.
@end deffn

@node Parser Backend Support
@section Parser Backend Support

Once you have written a grammar file that has been compiled into
Emacs Lisp code, additional glue needs to be written to finish
connecting the generated parser into the Emacs framework.

Large portions of this glue is automatically generated, but will
probably need additional modification to get things to work properly.

Typically, a grammar file @file{foo.wy} will create the file
@file{foo-wy.el}.  It is then useful to also create a file
@file{wisent-foo.el} (or @file{sematnic-foo.el}) to contain the parser
back end, or the glue that completes the semantic support for the
language.

@menu
* Example Backend File::
* Tag Expansion::
@end menu

@node Example Backend File
@subsection Example Backend File

Typical structure for this file is:

@example
;;; semantic-foo.el -- parser support for FOO.

;;; Your copyright Notice

(require 'foo-wy.el)  ;; The parser
(require 'foo) ;; major mode definition for FOO

;;; Code:

;;; Lexical Analyzer
;;
;; OPTIONAL
;; It is possible to define your lexical analyzer completely in your
;; grammar file.

(define-lex foo-lexical-analyzer
  "Create a lexical analyzer."
  ...)

;;; Expand Function
;;
;; OPTIONAL
;; Not all langauges are so complex as to need this function.
;; See `semantic-tag-expand-function' for more details.
(defun foo-tag-expand-function (tag)
  "Expand TAG into multiple tags if needed."
  ...)

;;; Parser Support
;;
;; OPTIONAL
;; If you need some specialty routines inside your grammar file
;; you can add some here.   The process may be to take diverse info
;; and reorganize it.
;;
;; It is also appropriate to write these functions in the prologue
;; of the grammar function.
(defun foo-do-something-hard (...)
  "...")

;;; Overload methods
;;
;; OPTIONAL
;; To allow your langauge to be fully supported by all the
;; applications that use semantic, it is important, but not necessary
;; to create implementations of overload methods.
(define-mode-overload-implementation some-semantic-function foo-mode (tag)
  "Implement some-semantic-function for FOO."
  )

;;;###autoload
(defun semantic-default-foo-setup ()
  "Set up a buffer for semantic parsing of the FOO language."
  (semantic-foo-by--install-parser)
  (setq semantic-tag-expand-function foo-tag-expand-function
        ;; Many other language specific settings can be done here
        ;; as well.
        )
  ;; This may be optional
  (setq semantic-lex-analyzer #'foo-lexical-analyzer)
  )

;;;###autoload
(add-hook 'foo-mode-hook 'semantic-default-foo-setup)

(provide 'semantic-c)

;;; semantic-foo.el ends here
@end example

@node Tag Expansion
@subsection Tag Expansion

In any language with compound tag types, you will need to implement
an @emph{expand function}.  Once written, assign it to this variable.

@defvar semantic-tag-expand-function
@anchor{semantic-tag-expand-function}
Function used to expand a tag.
It is passed each tag production, and must return a list of tags
derived from it, or @code{nil} if it does not need to be expanded.

Languages with compound definitions should use this function to expand
from one compound symbol into several.  For example, in @var{C} or Java the
following definition is easily parsed into one tag:

  int a, b;

This function should take this compound tag and turn it into two tags,
one for @var{A}, and the other for @var{B}.
@end defvar

Additionally, you can use the expand function in conjuntion with your
language for other types of compound statements.  For example, in
Common Lisp Object System, you can have a definition:

@example
(defclass classname nil
  (slots ...) ...)
@end example

This will create both the datatype @code{classname} and the functional
constructor @code{classname}.  Each slot may have a @code{:accessor}
method as well.

You can create a special compounded tag in your rule, for example:

@example
classdef: LPAREN DEFCLASS name semantic-list semantic-list RPAREN
          (TAG "custom" 'compound-class
               :value (list
                        (TYPE-TAG $3 "class" ...)
                        (FUNCTION-TAG $3 ...)
                        ))
        ;
@end example

and in your expand function, you would write:

@example
(defun my-tag-expand (tag)
  "Expand tags for my langauge."
  (when (semantic-tag-of-class-p tag 'compound-class)
     (remq nil
        (semantic-tag-get-attribute tag :value))))
@end example

This will cause the custom tag to be replaced by the tags created in
the :value attribute of the specially constructed tag.

@node Debugging
@chapter Debugging

Grammars can be tricky things to debug.  There are several types of
tools for debugging in Semantic, and the type of problem you have
requires different types of tools.

@menu
* Lexical Debugging::           
* Parser Output tools::         
* Bovine Parser Debugging::     
* Wisent Parser Debugging::     
* Overlay Debugging::           
* Incremental Parser Debugging::  
* Debugging Analysis::          
* Semantic 1.4 Doc::            
@end menu

@node Lexical Debugging
@section Lexical Debugging

The first major problem you may encounter is with lexical analysis.
If the text is not transformed into the expected token stream, no
parser will understand it.

You can step through the lexical analyzer with the following command:

@deffn Command semantic-lex-debug arg
@anchor{semantic-lex-debug}
Debug the semantic lexer in the current buffer.
Argument @var{ARG} specifies of the analyze the whole buffer, or start at point.
While engaged, each token identified by the lexer will be highlighted
in the target buffer   @var{A} description of the current token will be
displayed in the minibuffer.  Press @kbd{SPC} to move to the next lexical token.
@end deffn

For an example of what the output of the @code{semantic-lex} function
should return, see @ref{Lexer Output}.

@node Parser Output tools
@section Parser Output tools

There are several tools which can be used to see what the parser
output is.  These will work for any type of parser, including the
Bovine parser, Wisent parser.

The first and easiest is a minor mode which highlights text the
parser did not understand.

@deffn Command semantic-show-unmatched-syntax-mode &optional arg
@anchor{semantic-show-unmatched-syntax-mode}
Minor mode to highlight unmatched lexical syntax tokens.
When a parser executes, some elements in the buffer may not match any
parser rules.  These text characters are considered unmatched syntax.
Often time, the display of unmatched syntax can expose coding
problems before the compiler is run.

With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.

@table @kbd
@item key
binding
@item C-c ,
Prefix Command
@item C-c , `
semantic-show-unmatched-syntax-next
@end table
@end deffn

Another interesting mode will display a line between all the tags in
the current buffer to make it more obvious where boundaries lie.  You
can enable this as a minor mode.

@deffn Command semantic-show-tag-boundaries-mode &optional arg
@anchor{semantic-show-tag-boundaries-mode}
Minor mode to display a boundary in front of tags.
The boundary is displayed using an overline in Emacs @var{21}.
With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

Another interesting mode helps if you are worred about specific
attributes, you can se this minor mode to highlight different tokens
in different ways based on the attributes you are most concerned with.

@deffn Command semantic-highlight-by-attribute-mode &optional arg
@anchor{semantic-highlight-by-attribute-mode}
Minor mode to highlight tags based on some attribute.
By default, the protection of a tag will give it a different
background color.

With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

Another tool that can be used is a dump of the current list of tags.
This shows the actual Lisp representation of the tags generated in a
rather bland dump.  This can be useful if text was successfully
parsed, and you want to be sure that the correct information was
captured.

@deffn Command bovinate &optional clear
@anchor{bovinate}
Bovinate the current buffer.  Show output in a temp buffer.
Optional argument @var{CLEAR} will clear the cache before bovinating.
If @var{CLEAR} is negative, it will do a full reparse, and also not display
the output buffer.
@end deffn

@node Bovine Parser Debugging
@section Bovine Parser Debugging

The bovine parser is described in @inforef{top, ,bovine}.

Asside using a traditional Emacs Lisp debugger on functions you
provide for token expansion, there is one other means of debugging
which interactively steps over the rules in your grammar file.

@deffn Command semantic-debug
@anchor{semantic-debug}
Parse the current buffer and run in debug mode.
@end deffn

Once the parser is activated in this mode, the current tag cache is
flushed, and the parser started.  At each stage in the LALR parser,
the current rule, and match step is highlighted in your parser source
buffer.  In a second window, the text being parsed is shown, and the
lexical token found is highlighted.  A clue of the current stack of
saved data is displayed in the minibuffer.

There is a wide range of keybindings that can be used to execute code
in your buffer.  (Not all are implemented.)

@table @kbd
@item n
@itemx SPC
Next.
@item s
Step.
@item u
Up.  (Not implemented yet.)
@item d
Down.  (Not implemented yet.)
@item f
Fail Match.  Pretend the current match element and the token in the
buffer is a failed match, even if it is not.
@item h
Print information about the current parser state.
@item s
Jump to to the source buffer.
@item p
Jump to the parser buffer.
@item q
Quit.  Exits this debug session and the parser.
@item a
Abort.  Aborts one level of the parser, possibly exiting the debugger.
@item g
Go.  Stop debugging, and just start parsing.
@item b
Set Breakpoint.  (Not implemented yet.)
@item e
@code{eval-expression}.  Lets you execute some random Emacs Lisp command.
@end table

@b{Note:} While the core of @code{semantic-debug} is a generic
debugger interface for rule based grammars, only the bovine parser has
a specific backend implementation.  If someone wants to implement a
debugger backend for wisent, that would be spiff.

@node Wisent Parser Debugging
@section Wisent Parser Debugging

Wisent does not implement a backend for @code{semantic-debug}, it
does have some debugging commands the rule actions.  You can read
about them in the wisent manual.

@inforef{Grammar Debugging, , wisent}

@node Overlay Debugging
@section Overlay Debugging

Once a buffer has been parsed into a tag table, the next most
important step is getting those tags activated for a buffer, and
storable in a @code{semanticdb} backend.
@inforef{semanticdb, , semantic-appdev}.

These two activities depend on the ability of every tag in the table
to be linked and unlinked to the current buffer with an overlay.
@inforef{semantic-appdev, , Tag Overlay}
@inforef{semantic-appdev, , Tag Hooks}

In this case, the most important function that must be written is:

@defun semantic-tag-components-with-overlays tag
@anchor{semantic-tag-components-with-overlays}
Return the list of top level components belonging to @var{TAG}.
Children are any sub-tags which contain overlays.

Default behavior is to get @dfn{semantic-tag-components} in addition
to the components of an anonymous types (if applicable.)

Note for language authors:
  If a mode defines a language tag that has tags in it with overlays
you should still return them with this function.
Ignoring this step will prevent several features from working correctly.
This function can be overriden in semantic using the
symbol @code{tag-components-with-overlays}.
@end defun

If your are successfully building a tag table, and errors occur
saving or restoring tags from semanticdb, this is the most likely
cause of the problem.

@node Incremental Parser Debugging
@section Incremental Parser Debugging

The incremental parser is a highly complex engine for quickly
refreshing the tag table of a buffer after some set of changes have
been made to that buffer by a user.

There is no debugger or interface to the incremental parser, however
there are a few minor modes which can help you identify issues if you
think there are problems while incrementally parsing a buffer.

The first stage of the incremental parser is in tracking the changes
the user makes to a buffer.  You can visibly track these changes too.

@deffn Command semantic-highlight-edits-mode &optional arg
@anchor{semantic-highlight-edits-mode}
Minor mode for highlighting changes made in a buffer.
Changes are tracked by semantic so that the incremental parser can work
properly.
This mode will highlight those changes as they are made, and clear them
when the incremental parser accounts for those edits.
With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

Another important aspect of the incremental parser involves tracking
the current parser state of the buffer.  You can track this state
also.

@deffn Command semantic-show-parser-state-mode &optional arg
@anchor{semantic-show-parser-state-mode}
Minor mode for displaying parser cache state in the modeline.
The cache can be in one of three states.  They are
Up to date, Partial reprase needed, and Full reparse needed.
The state is indicated in the modeline with the following characters:
@table @kbd
@item -
The cache is up to date.
@item !
The cache requires a full update.
@item ^
The cache needs to be incrementally parsed.
@item %
The cache is not currently parseable.
@item @@
Auto-parse in progress (not set here.)
@end table

With prefix argument @var{ARG}, turn on if positive, otherwise off.  The
minor mode can be turned on only if semantic feature is available and
the current buffer was set up for parsing.  Return non-@code{nil} if the
minor mode is enabled.
@end deffn

When the incremental parser starts updating the tags buffer, you can
also enable a set of messages to help identify how the incremental
parser is merging changes with the main buffer.

@defvar semantic-edits-verbose-flag
@anchor{semantic-edits-verbose-flag}
Non-@code{nil} means the incremental perser is verbose.
If @code{nil}, errors are still displayed, but informative messages are not.
@end defvar

@node Debugging Analysis
@section Debugging Analysis

The semantic analyzer is a at the top of the food chain when it comes
to @semantic{} service functionality.  The semantic support for a
language must be absolute before analysis will work property.

A good way to test analysis is by placing the cursor in different
places, and requesting a dump of the context.

@deffn Command semantic-analyze-current-context position
@anchor{semantic-analyze-current-context}
Analyze the current context at @var{POSITION}.
If called interactively, display interesting information about @var{POSITION}
in a separate buffer.
Returns an object based on symbol @dfn{semantic-analyze-context}.
@end deffn

@ref{Semantic Analyzer Support}
@inforef{Analyzer, , semantic-user}

@node Semantic 1.4 Doc
@section Semantic 1.4 Doc

@i{
In semantic 1.4 the following documentation was written for debugging.
I'm leaving in here until better doc for 2.0 is done.
}

Writing language files using BY is significantly easier than writing
then using regular expressions in a functional manner.  Debugging
them, however, can still prove challenging.

There are two ways to debug a language definition if it is not
behaving as expected.  One way is to debug against the source @file{.by}
file.

If your language definition was written in BNF notation, debugging is
quite easy.  The command @code{semantic-debug} will start you off.

@deffn Command semantic-debug
Reprase the current buffer and run in parser debug mode.
@end deffn

While debugging, two windows are visible.  One window shows the file
being parsed, and the syntactic token being tested is highlighted.
The second window shows the table being used (in the BY source) with
the current rule highlighted.  The cursor will sit on the specific
match rule being tested against.

In the minibuffer, a brief summary of the current situation is
listed.  The first element is the syntactic token which is a list of
the form:

@example
(TYPE START . END)
@end example

The rest of the display is a list of all strings collected for the
currently tested rule.  Each time a new rule is entered, the list is
restarted.  Upon returning from a rule into a previous match list, the
previous match list is restored, with the production of the dependent
rule in the list.

Use @kbd{C-g} to stop debugging.  There are no commands for any
fancier types of debugging.

NOTE: Semantic 2.0 has more debugging commands.  Use:
@kbd{C-h m semantic-debug-mode} to view.

@node Parser Error Handling
@chapter Parser Error Handling
@cindex Parser Error Handling

NOTE: Write Me

@node GNU Free Documentation License
@appendix GNU Free Documentation License

@include fdl.texi

@node Index
@unnumbered Index
@printindex cp

@iftex
@contents
@summarycontents
@end iftex

@bye

@c Following comments are for the benefit of ispell.
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.