Source

pypy / pypy / doc / rlib.rst

David Malcolm 1e46012 



















Carl Friedrich B… 07acca3 
David Malcolm 1e46012 










Carl Friedrich B… 07acca3 
David Malcolm 1e46012 












Carl Friedrich B… 07acca3 
David Malcolm 1e46012 
















































Carl Friedrich B… 07acca3 
David Malcolm 1e46012 








Carl Friedrich B… 07acca3 
David Malcolm 1e46012 












Carl Friedrich B… 07acca3 
David Malcolm 1e46012 






Carl Friedrich B… 07acca3 
David Malcolm 1e46012 









Carl Friedrich B… 07acca3 
David Malcolm 1e46012 







Carl Friedrich B… 07acca3 
David Malcolm 1e46012 






Carl Friedrich B… 07acca3 
David Malcolm 1e46012 




















































Dario Bertini 5d78885 
David Malcolm 1e46012 


















































Carl Friedrich B… 3523202 




























David Malcolm 1e46012 





Carl Friedrich B… 07acca3 
David Malcolm 1e46012 













Carl Friedrich B… 07acca3 
David Malcolm 1e46012 

































Carl Friedrich B… 3523202 


















David Malcolm 1e46012 


Carl Friedrich B… 3523202 














David Malcolm 1e46012 















Carl Friedrich B… 3523202 




























David Malcolm 1e46012 


Carl Friedrich B… 3523202 
David Malcolm 1e46012 
Carl Friedrich B… 3523202 




















David Malcolm 1e46012 













Carl Friedrich B… 3523202 











David Malcolm 1e46012 

Carl Friedrich B… 3523202 








David Malcolm 1e46012 




Carl Friedrich B… 3523202 






















David Malcolm 1e46012 



Carl Friedrich B… 3523202 
David Malcolm 1e46012 
Carl Friedrich B… 3523202 












David Malcolm 1e46012 



































































Carl Friedrich B… 3523202 
David Malcolm 1e46012 
Carl Friedrich B… 3523202 




































David Malcolm 1e46012 

Dario Bertini 74b4a53 
David Malcolm 1e46012 
Carl Friedrich B… 07acca3 
Carl Friedrich B… 541308f 
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
=================================================
Generally Useful RPython Modules 
=================================================

.. _Python: http://www.python.org/dev/doc/maint24/ref/ref.html

.. contents::


This page lists some of the modules in `pypy/rlib`_ together with some hints
for what they can be used for. The modules here will make up some general
library useful for RPython programs (since most of the standard library modules
are not RPython). Most of these modules are somewhat rough still and are likely
to change at some point.  Usually it is useful to look at the tests in
`pypy/rlib/test`_ to get an impression of how to use a module.


``listsort``
============

The `pypy/rlib/listsort.py`_ module contains an implementation of the timsort sorting algorithm
(the sort method of lists is not RPython). To use it, subclass from the
``listsort.TimSort`` class and override the ``lt`` method to change the
comparison behaviour. The constructor of ``TimSort`` takes a list as an
argument, which will be sorted in place when the ``sort`` method of the
``TimSort`` instance is called. **Warning:** currently only one type of list can
be sorted using the ``listsort`` module in one program, otherwise the annotator
will be confused.

``nonconst``
============

The `pypy/rlib/nonconst.py`_ module is useful mostly for tests. The `flow object space`_ and
the `annotator`_ do quite some constant folding, which is sometimes not desired
in a test. To prevent constant folding on a certain value, use the ``NonConst``
class. The constructor of ``NonConst`` takes an arbitrary value. The instance of
``NonConst`` will behave during annotation like that value, but no constant
folding will happen.

.. _`flow object space`: objspace.html#the-flow-object-space
.. _`annotator`: translation.html#the-annotation-pass


``objectmodel``
===============

The `pypy/rlib/objectmodel.py`_ module is a mixed bag of various functionality. Some of the
more useful ones are:

``ComputedIntSymbolic``:
    Instances of ``ComputedIntSymbolic`` are treated like integers of unknown
    value by the annotator. The value is determined by a no-argument function
    (which needs to be passed into the constructor of the class). When the
    backend emits code, the function is called to determine the value.

``CDefinedIntSymbolic``:
    Instances of ``ComputedIntSymbolic`` are also treated like integers of
    unknown value by the annotator. When C code is emitted they will be
    represented by the attribute ``expr`` of the symbolic (which is also the
    first argument of the constructor).

``r_dict``:
    An RPython dict-like object. The constructor of r_dict takes two functions:
    ``key_eq`` and ``key_hash`` which are used for comparing and hashing the
    entries in the dictionary.

``instantiate(cls)``:
    Instantiate class ``cls`` without calling ``__init__``.

``we_are_translated()``:
    This function returns ``False`` when run on top of CPython, but the
    annotator thinks its return value is ``True``. Therefore it can be used to
    do different things on top of CPython than after translation. This should be
    used extremely sparingly (mostly for optimizations or debug code).

``cast_object_to_weakaddress(obj)``:
    Returns a sort of "weak reference" to obj, just without any convenience. The
    weak address that it returns is not invalidated if the object dies, so you
    need to take care yourself to know when the object dies. Use with extreme
    care.

``cast_weakadress_to_object(obj)``:
    Inverse of the previous function. If the object died then a segfault will
    ensue.

``UnboxedValue``:
    This is a class which should be used as a base class for a class which
    carries exactly one integer field. The class should have ``__slots__``
    with exactly one entry defined. After translation, instances of this class
    won't be allocated but represented by *tagged pointers**, that is pointers
    that have the lowest bit set.


``rarithmetic``
===============

The `pypy/rlib/rarithmetic.py`_ module contains functionality to handle the small differences
in the behaviour of arithmetic code in regular Python and RPython code. Most of
them are already described in the `coding guide`_

.. _`coding guide`: coding-guide.html


``rbigint``
===========

The `pypy/rlib/rbigint.py`_ module contains a full RPython implementation of the Python ``long``
type (which itself is not supported in RPython). The ``rbigint`` class contains
that implementation. To construct ``rbigint`` instances use the static methods
``fromint``, ``frombool``, ``fromfloat`` and ``fromdecimalstr``. To convert back
to other types use the methods ``toint``, ``tobool``, ``touint`` and
``tofloat``. Since RPython does not support operator overloading, all the
special methods of ``rbigint`` that would normally start and end with "__" have
these underscores left out for better readability (so ``a.add(b)`` can be used
to add two rbigint instances).


``rrandom``
===========

The `pypy/rlib/rrandom.py`_ module contains an implementation of the mersenne twister random
number generator. It contains one class ``Random`` which most importantly has a
``random`` method which returns a pseudo-random floating point number between
0.0 and 1.0.

``rsocket``
===========

The `pypy/rlib/rsocket.py`_ module contains an RPython implementation of the functionality of
the socket standard library with a slightly different interface.  The
difficulty with the Python socket API is that addresses are not "well-typed"
objects: depending on the address family they are tuples, or strings, and
so on, which is not suitable for RPython.  Instead, ``rsocket`` contains
a hierarchy of Address classes, in a typical static-OO-programming style.


``streamio``
============

The `pypy/rlib/streamio.py`_ contains an RPython stream I/O implementation (which was started
by Guido van Rossum as `sio.py`_ in the CPython sandbox as a prototype for the
upcoming new file implementation in Python 3000).

.. _`sio.py`: http://svn.python.org/view/sandbox/trunk/sio/sio.py

``unroll``
==========

The `pypy/rlib/unroll.py`_ module most importantly contains the function ``unrolling_iterable``
which wraps an iterator. Looping over the iterator in RPython code will not
produce a loop in the resulting flow graph but will unroll the loop instead.


``parsing``
===========

The `pypy/rlib/parsing/`_ module is a still in-development module to generate tokenizers and
parsers in RPython. It is still highly experimental and only really used by the
`Prolog interpreter`_ (although in slightly non-standard ways). The easiest way
to specify a tokenizer/grammar is to write it down using regular expressions and
simple EBNF format. 

The regular expressions are implemented using finite automatons. The parsing
engine uses `packrat parsing`_, which has O(n) parsing time but is more
powerful than LL(n) and LR(n) grammars.

.. _`packrat parsing`: http://pdos.csail.mit.edu/~baford/packrat/

Regular Expressions
-------------------

The regular expression syntax is mostly a subset of the syntax of the `re`_
module. By default, non-special characters match themselves. If you concatenate
regular expressions the result will match the concatenation of strings matched
by the single regular expressions.

``|``
    ``R|S`` matches any string that *either* matches R or matches S.

``*``
    ``R*`` matches 0 or more repetitions of R.

``+``
    ``R+`` matches 1 or more repetitions of R.

``?``
    ``R?`` matches 0 or 1 repetition of R.

``(...)``
    Parenthesis can be used to group regular expressions (note that in contrast
    to Python's re module you cannot later match the content of this group).

``{m}``
    ``R{m}`` matches exactly m repetitions of R.

``{m, n}``
    ``R{m, n}`` matches between m and n repetitions of R (including m and n).

``[]``
    Matches a set of characters. The characters to be matched can be listed
    sequentially. A range of characters can be specified using ``-``. For
    examples ``[ac-eg]`` matches the characters a, c, d, e and g.
    The whole set can be inverted by starting it with ``^``. So [^a] matches
    anything except a.

To parse a regular expression and to get a matcher for it, you can use the
function ``make_runner(s)`` in the ``pypy.rlib.parsing.regexparse`` module.  It
returns a object with a ``recognize(input)`` method that returns True or False
depending on whether ``input`` matches the string or not.

.. _`re`: http://docs.python.org/library/re.html

EBNF
----

To describe a tokenizer and a grammar the ``pypy.rlib.parsing.ebnfparse``
defines a syntax for doing that.

The syntax file contains a sequence or rules. Every rule either describes a
regular expression or a grammar rule.

Regular expressions rules have the form::

    NAME: "regex";

NAME is the name of the token that the regular expression
produces (it has to consist of upper-case letters), ``regex`` is a regular
expression with the syntax described above. One token name is special-cased: a
token called ``IGNORE`` will be filtered out of the token stream before being
passed on to the parser and can thus be used to match comments or
non-significant whitespace.

Grammar rules have the form::
    
    name: expansion_1 | expansion_2 | ... | expansion_n;

Where ``expansion_i`` is a sequence of nonterminal or token names::

    symbol_1 symbol_2 symbol_3 ... symbol_n

This means that the nonterminal symbol ``name`` (which has to consist of
lower-case letters) can be expanded into any of the expansions. The expansions
can consist of a sequence of token names, nonterminal names or literals, which
are strings in quotes that are matched literally.

An example to make this clearer::
    
    IGNORE: " ";
    DECIMAL: "0|[1-9][0-9]*";
    additive: multitive "+" additive |
              multitive;
    multitive: primary "*" multitive |
               primary;
    primary: "(" additive ")" | DECIMAL;

This grammar describes the syntax of arithmetic impressions involving addition
and multiplication. The tokenizer
produces a stream of either DECIMAL tokens or tokens that have matched one of
the literals "+", "*", "(" or ")". Any space will be ignored. The grammar
produces a syntax tree that follows the precedence of the operators. For example
the expression ``12 + 4 * 5`` is parsed into the following tree:

.. graphviz::

    digraph G{
    "-1213931828" [label="additive"];
    "-1213931828" -> "-1213951956";
    "-1213951956" [label="multitive"];
    "-1213951956" -> "-1213949172";
    "-1213949172" [label="primary"];
    "-1213949172" -> "-1213949812";
    "-1213949812" [shape=box,label="DECIMAL\l'12'"];
    "-1213931828" -> "-1213935220";
    "-1213935220" [shape=box,label="__0_+\l'+'"];
    "-1213931828" -> "-1213951316";
    "-1213951316" [label="additive"];
    "-1213951316" -> "-1213948180";
    "-1213948180" [label="multitive"];
    "-1213948180" -> "-1213951380";
    "-1213951380" [label="primary"];
    "-1213951380" -> "-1213951508";
    "-1213951508" [shape=box,label="DECIMAL\l'4'"];
    "-1213948180" -> "-1213948788";
    "-1213948788" [shape=box,label="__1_*\l'*'"];
    "-1213948180" -> "-1213951060";
    "-1213951060" [label="multitive"];
    "-1213951060" -> "-1213948980";
    "-1213948980" [label="primary"];
    "-1213948980" -> "-1213950420";
    "-1213950420" [shape=box,label="DECIMAL\l'5'"];
    }

Parse Trees
-----------

The parsing process builds up a tree consisting of instances of ``Symbol`` and
``Nonterminal``, the former corresponding to tokens, the latter to nonterminal
symbols. Both classes live in the `pypy/rlib/parsing/tree.py`_ module. You can use
the ``view()`` method ``Nonterminal`` instances to get a pygame view of the
parse tree.

``Symbol`` instances have the following attributes: ``symbol``, which is the
name of the token and ``additional_info`` which is the matched source.

``Nonterminal`` instances have the following attributes: ``symbol`` is the name
of the nonterminal and ``children`` which is a list of the children attributes.


Visitors
++++++++

To write tree visitors for the parse trees that are RPython, there is a special
baseclass ``RPythonVisitor`` in `pypy/rlib/parsing/tree.py`_ to use. If your
class uses this, it will grow a ``dispatch(node)`` method, that calls an
appropriate ``visit_<symbol>`` method, depending on the ``node`` argument. Here
the <symbol> is replaced by the ``symbol`` attribute of the visited node.

For the visitor to be RPython, the return values of all the visit methods need
to be of the same type.


Tree Transformations
--------------------

As the tree of arithmetic example above shows, by default the parse tree
contains a lot of nodes that are not really conveying useful information.
To get rid of some of them, there is some support in the grammar format to
automatically create a visitor that transforms the tree to remove the additional
nodes. The simplest such transformation just removes nodes, but there are
more complex ones.

The syntax for these transformations is to enclose symbols in expansions of a
nonterminal by [...], <...> or >...<.

[symbol_1 symbol_2 ... symbol_n]
++++++++++++++++++++++++++++++++

This will produce a transformer that completely removes the enclosed symbols
from the tree.

Example::

    IGNORE: " ";
    n: "A" [","] n | "A";

Parsing the string "A, A, A" gives the tree:

.. graphviz::

    digraph G{
    "-1213678004" [label="n"];
    "-1213678004" -> "-1213681108";
    "-1213681108" [shape=box,label="__0_A\n'A'"];
    "-1213678004" -> "-1213681332";
    "-1213681332" [shape=box,label="__1_,\n','"];
    "-1213678004" -> "-1213837780";
    "-1213837780" [label="n"];
    "-1213837780" -> "-1213837716";
    "-1213837716" [shape=box,label="__0_A\n'A'"];
    "-1213837780" -> "-1213839476";
    "-1213839476" [shape=box,label="__1_,\n','"];
    "-1213837780" -> "-1213839956";
    "-1213839956" [label="n"];
    "-1213839956" -> "-1213840948";
    "-1213840948" [shape=box,label="__0_A\n'A'"];
    }

After transformation the tree has the "," nodes removed:

.. graphviz::

    digraph G{
    "-1219325716" [label="n"];
    "-1219325716" -> "-1219325844";
    "-1219325844" [shape=box,label="__0_A\n'A'"];
    "-1219325716" -> "-1219324372";
    "-1219324372" [label="n"];
    "-1219324372" -> "-1219325524";
    "-1219325524" [shape=box,label="__0_A\n'A'"];
    "-1219324372" -> "-1219324308";
    "-1219324308" [label="n"];
    "-1219324308" -> "-1219325492";
    "-1219325492" [shape=box,label="__0_A\n'A'"];
    }

<symbol>
++++++++

This will replace the parent with symbol. Every expansion can contain at most
one symbol that is enclosed by <...>, because the parent can only be replaced
once, obviously.

Example::

    IGNORE: " ";
    n: "a" "b" "c" m;
    m: "(" <n> ")" | "d";

Parsing the string "a b c (a b c d)" gives the tree:

.. graphviz::

    digraph G{
    "-1214029460" [label="n"];
    "-1214029460" -> "-1214026452";
    "-1214026452" [shape=box,label="__0_a\n'a'"];
    "-1214029460" -> "-1214028276";
    "-1214028276" [shape=box,label="__1_b\n'b'"];
    "-1214029460" -> "-1214027316";
    "-1214027316" [shape=box,label="__2_c\n'c'"];
    "-1214029460" -> "-1214026868";
    "-1214026868" [label="m"];
    "-1214026868" -> "-1214140436";
    "-1214140436" [shape=box,label="__3_(\n'('"];
    "-1214026868" -> "-1214143508";
    "-1214143508" [label="n"];
    "-1214143508" -> "-1214141364";
    "-1214141364" [shape=box,label="__0_a\n'a'"];
    "-1214143508" -> "-1214141748";
    "-1214141748" [shape=box,label="__1_b\n'b'"];
    "-1214143508" -> "-1214140756";
    "-1214140756" [shape=box,label="__2_c\n'c'"];
    "-1214143508" -> "-1214144468";
    "-1214144468" [label="m"];
    "-1214144468" -> "-1214414868";
    "-1214414868" [shape=box,label="__5_d\n'd'"];
    "-1214026868" -> "-1214141492";
    "-1214141492" [shape=box,label="__4_)\n')'"];
    }

After transformation the tree looks like this:

.. graphviz::

    digraph G{
    "-1219949908" [label="n"];
    "-1219949908" -> "-1214026452";
    "-1214026452" [shape=box,label="__0_a\n'a'"];
    "-1219949908" -> "-1214028276";
    "-1214028276" [shape=box,label="__1_b\n'b'"];
    "-1219949908" -> "-1214027316";
    "-1214027316" [shape=box,label="__2_c\n'c'"];
    "-1219949908" -> "-1219949876";
    "-1219949876" [label="n"];
    "-1219949876" -> "-1214141364";
    "-1214141364" [shape=box,label="__0_a\n'a'"];
    "-1219949876" -> "-1214141748";
    "-1214141748" [shape=box,label="__1_b\n'b'"];
    "-1219949876" -> "-1214140756";
    "-1214140756" [shape=box,label="__2_c\n'c'"];
    "-1219949876" -> "-1219949748";
    "-1219949748" [label="m"];
    "-1219949748" -> "-1214414868";
    "-1214414868" [shape=box,label="__5_d\n'd'"];
    }

>nonterminal_1 nonterminal_2 ... nonterminal_n<
+++++++++++++++++++++++++++++++++++++++++++++++

This replaces the nodes nonterminal_1 to nonterminal_n by their children.

Example::

    IGNORE: " ";
    DECIMAL: "0|[1-9][0-9]*";
    list: DECIMAL >list< | DECIMAL;

Parsing the string "1 2" gives the tree:

.. graphviz::

    digraph G{
    "-1213518708" [label="list"];
    "-1213518708" -> "-1213518196";
    "-1213518196" [shape=box,label="DECIMAL\n'1'"];
    "-1213518708" -> "-1213518260";
    "-1213518260" [label="list"];
    "-1213518260" -> "-1213520308";
    "-1213520308" [shape=box,label="DECIMAL\n'2'"];
    }

after the transformation the tree looks like:

.. graphviz::

    digraph G{
    "-1219505652" [label="list"];
    "-1219505652" -> "-1213518196";
    "-1213518196" [shape=box,label="DECIMAL\n'1'"];
    "-1219505652" -> "-1213520308";
    "-1213520308" [shape=box,label="DECIMAL\n'2'"];
    }

Note that the transformation works recursively. That means that the following
also works: if the string "1 2 3 4 5" is parsed the tree at first looks like
this:

.. graphviz::

    digraph G{
    "-1213611892" [label="list"];
    "-1213611892" -> "-1213608980";
    "-1213608980" [shape=box,label="DECIMAL\n'1'"];
    "-1213611892" -> "-1213623476";
    "-1213623476" [label="list"];
    "-1213623476" -> "-1213623380";
    "-1213623380" [shape=box,label="DECIMAL\n'2'"];
    "-1213623476" -> "-1213442868";
    "-1213442868" [label="list"];
    "-1213442868" -> "-1213441652";
    "-1213441652" [shape=box,label="DECIMAL\n'3'"];
    "-1213442868" -> "-1213441332";
    "-1213441332" [label="list"];
    "-1213441332" -> "-1213441620";
    "-1213441620" [shape=box,label="DECIMAL\n'4'"];
    "-1213441332" -> "-1213443060";
    "-1213443060" [label="list"];
    "-1213443060" -> "-1213442100";
    "-1213442100" [shape=box,label="DECIMAL\n'5'"];
    }

But after transformation the whole thing collapses to one node with a lot of
children:

.. graphviz::

    digraph G{
    "-1219430228" [label="list"];
    "-1219430228" -> "-1213608980";
    "-1213608980" [shape=box,label="DECIMAL\n'1'"];
    "-1219430228" -> "-1213623380";
    "-1213623380" [shape=box,label="DECIMAL\n'2'"];
    "-1219430228" -> "-1213441652";
    "-1213441652" [shape=box,label="DECIMAL\n'3'"];
    "-1219430228" -> "-1213441620";
    "-1213441620" [shape=box,label="DECIMAL\n'4'"];
    "-1219430228" -> "-1213442100";
    "-1213442100" [shape=box,label="DECIMAL\n'5'"];
    }

Extensions to the EBNF grammar format
-------------------------------------

There are some extensions to the EBNF grammar format that are really only
syntactic sugar but make writing grammars less tedious. These are:

``symbol?``:
    matches 0 or 1 repetitions of symbol

``symbol*``:
    matches 0 or more repetitions of symbol. After the tree transformation all
    these repetitions are children of the current symbol.

``symbol+``:
    matches 1 or more repetitions of symbol. After the tree transformation all
    these repetitions are children of the current symbol.

These are implemented by adding some more rules to the grammar in the correct
way. Examples: the grammar::

    s: a b? c;

is transformed to look like this::

    s: a >_maybe_symbol_0_< c | a c;
    _maybe_symbol_0_: b;

The grammar::

    s: a b* c;

is transformed to look like this::

    s: a >_star_symbol_0< c | a c;
    _star_symbol_0: b >_symbol_star_0< | b;

The grammar::

    s: a b+ c;

is transformed to look like this::

    s: a >_plus_symbol_0< c;
    _plus_symbol_0: b >_plus_symbol_0< | b;


Full Example
------------

A semi-complete parser for the `json format`_::

    STRING: "\\"[^\\\\"]*\\"";
    NUMBER: "\-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][\+\-]?[0-9]+)?";
    IGNORE: " |\n";
    value: <STRING> | <NUMBER> | <object> | <array> | <"null"> |
           <"true"> | <"false">;
    object: ["{"] (entry [","])* entry ["}"];
    array: ["["] (value [","])* value ["]"];
    entry: STRING [":"] value;


The resulting tree for parsing the string::

    {"a": "5", "b": [1, null, 3, true, {"f": "g", "h": 6}]}

looks like this:

.. graphviz::

    digraph G{
    "-1220061652" [label="object"];
    "-1220061652" -> "-1220127636";
    "-1220127636" [label="entry"];
    "-1220127636" -> "-1213915636";
    "-1213915636" [shape=box,label="STRING\n'a'"];
    "-1220127636" -> "-1214251156";
    "-1214251156" [shape=box,label="STRING\n'5'"];
    "-1220061652" -> "-1220063188";
    "-1220063188" [label="entry"];
    "-1220063188" -> "-1214253076";
    "-1214253076" [shape=box,label="STRING\n'b'"];
    "-1220063188" -> "-1220059444";
    "-1220059444" [label="array"];
    "-1220059444" -> "-1214253364";
    "-1214253364" [shape=box,label="NUMBER\n'1'"];
    "-1220059444" -> "-1214254292";
    "-1214254292" [shape=box,label="__0_null\n'null'"];
    "-1220059444" -> "-1214253268";
    "-1214253268" [shape=box,label="NUMBER\n'3'"];
    "-1220059444" -> "-1214252596";
    "-1214252596" [shape=box,label="__1_true\n'true'"];
    "-1220059444" -> "-1220062260";
    "-1220062260" [label="object"];
    "-1220062260" -> "-1220060116";
    "-1220060116" [label="entry"];
    "-1220060116" -> "-1214211860";
    "-1214211860" [shape=box,label="STRING\n'f'"];
    "-1220060116" -> "-1214210132";
    "-1214210132" [shape=box,label="STRING\n'g'"];
    "-1220062260" -> "-1220062868";
    "-1220062868" [label="entry"];
    "-1220062868" -> "-1214211956";
    "-1214211956" [shape=box,label="STRING\n'h'"];
    "-1220062868" -> "-1214212308";
    "-1214212308" [shape=box,label="NUMBER\n'6'"];
    }


.. _`Prolog interpreter`: https://bitbucket.org/cfbolz/pyrolog/
.. _`json format`: http://www.json.org

.. include:: _ref.txt
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.