1. Robert Brewer
  2. jbnf

Source

jbnf / spec.txt

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
JSON BNF for Syntax Specifications: JBNF

1. Introduction

Internet technical specifications built using JavaScript Object
Notation (JSON) often need to define a formal syntax and are free
to employ whatever notation their authors deem useful. A modified
version of Backus-Naur Form (BNF), called JSON BNF (JBNF), is
specified herein.  It balances compactness and simplicity,
with reasonable representational power.

The differences between standard BNF and JBNF involve naming rules,
repetition, alternatives, order-independence, and value ranges.
Appendix B supplies rule definitions and encoding for a core lexical
analyzer of the type common to several Internet specifications.  It
is provided as a convenience and is otherwise separate from the meta
language defined in the body of this document, and separate from its
formal status.

2. Rule Definition

2.1. Rule Naming

The name of a rule is simply the name itself; that is, a sequence of
characters, beginning with an alphabetic character, and followed by a
combination of alphabetics, digits, and hyphens.

NOTE:

    Rule names are case-insensitive

The names <rulename>, <Rulename>, <RULENAME>, and <rUlENamE> all
refer to the same rule.

Unlike original BNF, angle brackets ("<", ">") are not required.
However, angle brackets may be used around a rule name whenever their
presence facilitates in discerning the use of a rule name.  This is
typically restricted to rule name references in free-form prose, or
to distinguish partial rules that combine into a string not separated
by white space.

2.2. Rule Form

A rule is defined by the following sequence:

    name =  elements crlf

where <name> is the name of the rule, <elements> is one or more rule
names or terminal specifications, and <crlf> is the end-of-line
indicator (carriage return followed by line feed).  The equal sign
separates the name from the definition of the rule.  The elements
form a sequence of one or more rule names and/or value definitions,
combined according to the various operators defined in this document,
such as alternative and repetition.

For visual ease, rule definitions are left aligned.  When a rule
requires multiple lines, the continuation lines are indented.  The
left alignment and indentation are relative to the first lines of the
JBNF rules and need not match the left margin of the document.

2.3.  Terminal Values

Rules resolve into a string of terminal values, sometimes called
characters.  In JBNF, a character is merely a non-negative integer.
In certain contexts, a specific mapping (encoding) of values into a
character set (such as ASCII) will be specified.

Terminals are specified by one or more numeric characters, with the
base interpretation of those characters indicated explicitly.  The
following bases are currently defined:

     b           =  binary

     d           =  decimal

     x           =  hexadecimal

Hence:

     CR          =  %d13

     CR          =  %x0D

respectively specify the decimal and hexadecimal representation of
[US-ASCII] for carriage return.

A concatenated string of such values is specified compactly, using a
period (".") to indicate a separation of characters within that
value.  Hence:

     CRLF        =  %d13.10

2.3.1 Literal JSON strings

JBNF permits the specification of literal text strings directly,
enclosed in quotation-marks.  Hence:

     command     =  "command string"

Literal text strings are interpreted as a concatenated set of
printable characters, including the opening and closing DQUOTE.
This makes any such element match a complete JSON string value,
without having to re-specify the DQUOTE characters each time.

NOTE:

    JBNF strings are case-sensitive and the character set for these
    strings is defined by JSON [RFC 4627].

Hence:

    rulename = "abc" number "def"

will match "\"abc\"13\"def\"" (including the quotation marks), but will
NOT match "abc13def" (without the quotation marks). To specify a rule
that does NOT include quotation marks, specify the characters
individually. For example:

     rulename    =  %d97.98.99 number %d100.101.102

will match "abc13def". This makes embedding datatypes in strings,
and specifying string templates, rather difficult. So don't do that.

2.4.  External Encodings

External representations of terminal value characters will vary
according to constraints in the storage or transmission environment.
Hence, the same JBNF-based grammar may have multiple external
encodings, such as one for a 7-bit US-ASCII environment, another for
a binary octet environment, and still a different one when 16-bit
Unicode is used.  Encoding details are beyond the scope of JBNF,
although Appendix A (Core) provides definitions for a 7-bit US-ASCII
environment as has been common to much of the Internet.

By separating external encoding from the syntax, it is intended that
alternate encoding environments can be used for the same syntax.

3.  OPERATORS

3.1.  String Concatenation:  Rule1 Rule2

A rule can define a simple, ordered string of values (i.e., a
concatenation of contiguous characters) by listing a sequence of rule
names.  For example:

     foo         =  %x61           ; a

     bar         =  %x62           ; b

     mumble      =  foo bar foo

So that the rule <mumble> matches the lowercase string "aba".

LINEAR WHITE SPACE: Concatenation is at the core of the JBNF parsing
model.  A string of contiguous characters (values) is parsed
according to the rules defined in JBNF.  For Internet specifications,
there is some history of permitting linear white space (space and
horizontal tab) to be freely and implicitly interspersed around major
constructs, such as delimiting special characters or atomic strings.

NOTE:

  This specification for JBNF does not provide for implicit
  specification of linear white space.

Any grammar that wishes to permit linear white space around
delimiters or string segments must specify it explicitly.  It is
often useful to provide for such white space in "core" rules that are
then used variously among higher-level rules.  The "core" rules might
be formed into a lexical analyzer or simply be part of the main
ruleset.

3.2.  Alternatives:  Rule1 / Rule2

Elements separated by a forward slash ("/") are alternatives.
Therefore,

     foo / bar

will accept <foo> or <bar>.

3.3.  Incremental Alternatives: Rule1 =/ Rule2

It is sometimes convenient to specify a list of alternatives in
fragments.  That is, an initial rule may match one or more
alternatives, with later rule definitions adding to the set of
alternatives.  This is particularly useful for otherwise, independent
specifications that derive from the same parent rule set, such as
often occurs with parameter lists.  JBNF permits this incremental
definition through the construct:

     oldrule     =/ additional-alternatives

So that the rule set

     ruleset     =  alt1 / alt2

     ruleset     =/ alt3

     ruleset     =/ alt4 / alt5

is the same as specifying

     ruleset     =  alt1 / alt2 / alt3 / alt4 / alt5

3.4.  Value Range Alternatives:  %c##-##

A range of alternative numeric values can be specified compactly,
using hyphen ("-") to indicate the range of alternative values.  Hence:

     DIGIT       =  %x30-39

is equivalent to:

     DIGIT       =  %x30 / %x31 / %x32 / %x33 / %x34 / %x35 / %x36 /

                    %x37 / %x38 / %x39

Concatenated numeric values and numeric value ranges cannot be
specified in the same string.  A numeric value may use the dotted
notation for concatenation or it may use the hyphen notation to specify
one value range.  Hence, to specify one printable character between
end of line sequences, the specification could be:

     char-line = %x0D.0A %x20-7E %x0D.0A

3.5.  Sequence Group:  (Rule1 Rule2)

Elements enclosed in parentheses are treated as a single element,
whose contents are STRICTLY ORDERED.  Thus,

     elem (foo / bar) blat

matches (elem foo blat) or (elem bar blat), and

     (elem foo) / (bar blat)

matches (elem foo) or (bar blat).

NOTE:

  It is strongly advised that grouping notation be used, rather than
  relying on the proper reading of "bare" alternations, when
  alternatives consist of multiple rule names or literals.

Hence, it is recommended that the following form be used:

    (elem foo) / (bar blat)

It will avoid misinterpretation by casual readers.

The sequence group notation is also used within free text to set off
an element sequence from the prose.

3.6.  Variable Repetition:  Rule* and Rule+

The operator "*" following an element indicates repetition;
the preceding rule matches zero or more times.

The operator "+" following an element indicates repetition;
the preceding rule matches one or more times.

3.7. JSON value operators

3.7.1 Array:  [Rule]

The operators "[" and "]" surrounding an element indicates a
JSON array containing that element. If the contained element
is a sequence or repetition, the elements within match
the intermediate value-separator elements required by JSON
arrays. Thus,

    [string*]

matches ["foo", "bar"] or ["baz"] or [].

3.7.2. Object member:  key: value

Elements connected by a colon operator ":" are treated as a single
JSON object member. The key MUST be a JSON string and the value
may be any element. Remember that string literals match surrounding
DQUOTEs automatically. Thus,

    {elem "foo": string blat}

matches {elem, "foo": "bar", blat}.

3.7.3. Object

The operators "{" and "}" surrounding an element indicates a
JSON object containing that element. If the contained element
is a sequence, object member, or repetition, the elements within match
the intermediate value-separator elements required by JSON
objects. Thus,

    {("foo": number)*}

matches {"foo": 51, "foo": 13.8} or {"foo": 72} or {}.

Also, just as JSON objects are unordered collections, any element(s)
contained within object operators are inherently unordered.

3.8. Optional element: elem?

The operator "?" following an element indicates that the element
is optional; that is, it matches zero or one such element. Thus,

    [string string? number]

matches ["foo", "bar", 3] or ["baz", 12], but not ["a", "b"].

3.9.  Comment:  ; Comment

A semi-colon starts a comment that continues to the end of line.
This is a simple way of including useful notes in parallel with the
specifications.

3.10.  Operator Precedence

The various mechanisms described above have the following precedence,
from highest (binding tightest) at the top, to lowest (loosest) at
the bottom:

    Strings, Names formation

    Comment

    Value range

    Repetition

    Object member

    Array, Object

    Grouping, Optional

    Concatenation

    Alternative

Use of the alternative operator, freely mixed with sequence grouping,
can be confusing.

  Again, it is recommended that the grouping operator be used to
  make explicit, ordered, sequence groups.

4.  JBNF DEFINITION OF JBNF

NOTES:

  1. This syntax requires a formatting of rules that is relatively
     strict.  Hence, the version of a ruleset included in a
     specification might need preprocessing to ensure that it can be
     interpreted by an JBNF parser.

  2. This syntax uses the rules provided in Appendix B (Core).

    rulelist       =  ( rule / (c-wsp* c-nl) )+

    rule           =  rulename defined-as elements c-nl
                        ; continues if next line starts
                        ;  with white space

    rulename       =  ALPHA (ALPHA / DIGIT / hyphen)*

    defined-as     =  c-wsp* (equal forward-slash?) c-wsp*
                        ; basic rules definition and
                        ;  incremental alternatives

    elements       =  alternation c-wsp*

    c-wsp          =  WSP / (c-nl WSP)

    c-nl           =  comment / CRLF
                        ; comment or newline

    comment        =  semicolon (WSP / VCHAR)* CRLF

    array          =  left-square-bracket element right-square-bracket

    object         =  left-curly-bracket element right-curly-bracket

    object-member  =  element c-wsp* colon c-wsp* element

    alternation    =  concatenation
                      (c-wsp* forward-slash c-wsp* concatenation)*

    concatenation  =  repetition (c-wsp+ repetition)*

    repetition     =  element asterisk

    element        =  rulename / group / option /
                      string / num-val / prose-val

    group          =  left-paren c-wsp* alternation c-wsp* right-paren

    option         =  c-wsp* alternation c-wsp* question-mark

    num-val        =  percent-sign (bin-val / dec-val / hex-val)

    bin-val        =  %x62 BIT+
                       ((period BIT+)+ / (hyphen BIT+))?
                        ; series of concatenated bit values
                        ;  or single ONEOF range

    dec-val        =  %x64 DIGIT+
                       ((period DIGIT+)+ / (hyphen DIGIT+))?

    hex-val        =  %x78 HEXDIG+
                       ((period HEXDIG+)+ / (hyphen HEXDIG+))?

    prose-val      =  left-angle-bracket (%x20-3D / %x3F-7E)* right-angle-bracket
                        ; bracketed string of SP and VCHAR
                        ;  without angles
                        ; prose description, to be used as
                        ;  last resort

5.  SECURITY CONSIDERATIONS

Security is truly believed to be irrelevant to this document.

6.  References

6.1.  Normative References

[US-ASCII] American National Standards Institute, "Coded Character
          Set -- 7-bit American Standard Code for Information
          Interchange", ANSI X3.4, 1986.


Appendix A.  ACKNOWLEDGEMENTS

Appendix B.  APPENDIX - CORE JBNF OF JBNF

This Appendix is provided as a convenient core for specific grammars.
The definitions may be used as a core set of rules.

B.1.  Core Rules

Certain basic rules are in uppercase, such as SP, HTAB, CRLF, DIGIT,
ALPHA, etc.

    ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z

    BIT            =  zero / one

    CHAR           =  %x01-7F
                        ; any 7-bit US-ASCII character,
                        ;  excluding NUL

    CR             =  %x0D
                        ; carriage return

    CRLF           =  CR LF
                        ; Internet standard newline

    CTL            =  %x00-1F / %x7F
                        ; controls

    DIGIT          =  %x30-39
                        ; 0-9

    DQUOTE         =  %x22
                        ; " (Double Quote)

    HEXDIG         =  DIGIT / %x41 / %x42 / %x43 / %x44 / %x45 / %x46
                        ; 0-9, "ABCDEF"

    HTAB           =  %x09
                        ; horizontal tab

    LF             =  %x0A
                        ; linefeed

    LWSP           =  (WSP / CRLF WSP)*
                        ; linear white space (past newline)

    OCTET          =  %x00-FF
                        ; 8 bits of data

    SP             =  %x20

    VCHAR          =  %x21-7E
                        ; visible (printing) characters

    WSP            =  SP / HTAB
                        ; white space

    percent-sign   =  %x25
    left-paren     =  %x28
    right-paren    =  %x29
    asterisk       =  %x2A
    hyphen         =  %x2D
    period         =  %x2E
    forward-slash  =  %x2F
    semicolon      =  %x3B
    equal          =  %x3D
    question-mark  =  %x3F
    left-angle-bracket   =  %x3C
    right-angle-bracket  =  %x3E
    left-square-bracket  =  %x5B
    right-square-bracket =  %x5D
    left-curly-bracket   =  %x7B
    right-curly-bracket  =  %x7D

    boolean        =  (false / true)


B.2.  Common Encoding

Externally, data are represented as "network virtual ASCII" (namely,
7-bit US-ASCII in an 8-bit field), with the high (8th) bit set to
zero.  A string of values is in "network byte order", in which the
higher-valued bytes are represented on the left-hand side and are
sent over the network first.

Authors' Addresses

Robert Brewer (editor)
YouGov America Inc
285 Hamilton Ave, Suite 200
Palo Alto, CA 94301
USA

Phone: +1.650.462.8000
EMail: robert.brewer@yougov.com

Full Copyright Statement

Copyright (C) YouGov (2011).

This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, YOUGOV AND ITS SUBSIDIARIES,
DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN
WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.