Commits

Robert Brewer  committed e64f0ca

Initial draft.

  • Participants
  • Tags draft-00

Comments (0)

Files changed (1)

+JSON BNF for Syntax Specifications: JBNF
+
+1. Introduction
+
+Internet technical specifications built using JavaScript Object
+Notation (JSON) often need to define a formal syntax and are free
+to employ whatever notation their authors deem useful. A modified
+version of Backus-Naur Form (BNF), called JSON BNF (JBNF), is
+specified herein.  It balances compactness and simplicity,
+with reasonable representational power.
+
+The differences between standard BNF and JBNF involve naming rules,
+repetition, alternatives, order-independence, and value ranges.
+Appendix B supplies rule definitions and encoding for a core lexical
+analyzer of the type common to several Internet specifications.  It
+is provided as a convenience and is otherwise separate from the meta
+language defined in the body of this document, and separate from its
+formal status.
+
+2. Rule Definition
+
+2.1. Rule Naming
+
+The name of a rule is simply the name itself; that is, a sequence of
+characters, beginning with an alphabetic character, and followed by a
+combination of alphabetics, digits, and hyphens.
+
+NOTE:
+
+    Rule names are case-insensitive
+
+The names <rulename>, <Rulename>, <RULENAME>, and <rUlENamE> all
+refer to the same rule.
+
+Unlike original BNF, angle brackets ("<", ">") are not required.
+However, angle brackets may be used around a rule name whenever their
+presence facilitates in discerning the use of a rule name.  This is
+typically restricted to rule name references in free-form prose, or
+to distinguish partial rules that combine into a string not separated
+by white space.
+
+2.2. Rule Form
+
+A rule is defined by the following sequence:
+
+    name =  elements crlf
+
+where <name> is the name of the rule, <elements> is one or more rule
+names or terminal specifications, and <crlf> is the end-of-line
+indicator (carriage return followed by line feed).  The equal sign
+separates the name from the definition of the rule.  The elements
+form a sequence of one or more rule names and/or value definitions,
+combined according to the various operators defined in this document,
+such as alternative and repetition.
+
+For visual ease, rule definitions are left aligned.  When a rule
+requires multiple lines, the continuation lines are indented.  The
+left alignment and indentation are relative to the first lines of the
+JBNF rules and need not match the left margin of the document.
+
+2.3.  Terminal Values
+
+Rules resolve into a string of terminal values, sometimes called
+characters.  In JBNF, a character is merely a non-negative integer.
+In certain contexts, a specific mapping (encoding) of values into a
+character set (such as ASCII) will be specified.
+
+Terminals are specified by one or more numeric characters, with the
+base interpretation of those characters indicated explicitly.  The
+following bases are currently defined:
+
+     b           =  binary
+
+     d           =  decimal
+
+     x           =  hexadecimal
+
+Hence:
+
+     CR          =  %d13
+
+     CR          =  %x0D
+
+respectively specify the decimal and hexadecimal representation of
+[US-ASCII] for carriage return.
+
+A concatenated string of such values is specified compactly, using a
+period (".") to indicate a separation of characters within that
+value.  Hence:
+
+     CRLF        =  %d13.10
+
+2.3.1 Literal JSON strings
+
+JBNF permits the specification of literal text strings directly,
+enclosed in quotation-marks.  Hence:
+
+     command     =  "command string"
+
+Literal text strings are interpreted as a concatenated set of
+printable characters, including the opening and closing DQUOTE.
+This makes any such element match a complete JSON string value,
+without having to re-specify the DQUOTE characters each time.
+
+NOTE:
+
+    JBNF strings are case-sensitive and the character set for these
+    strings is defined by JSON [RFC 4627].
+
+Hence:
+
+    rulename = "abc" number "def"
+
+will match "\"abc\"13\"def\"" (including the quotation marks), but will
+NOT match "abc13def" (without the quotation marks). To specify a rule
+that does NOT include quotation marks, specify the characters
+individually. For example:
+
+     rulename    =  %d97.98.99 number %d100.101.102
+
+will match "abc13def". This makes embedding datatypes in strings,
+and specifying string templates, rather difficult. So don't do that.
+
+2.4.  External Encodings
+
+External representations of terminal value characters will vary
+according to constraints in the storage or transmission environment.
+Hence, the same JBNF-based grammar may have multiple external
+encodings, such as one for a 7-bit US-ASCII environment, another for
+a binary octet environment, and still a different one when 16-bit
+Unicode is used.  Encoding details are beyond the scope of JBNF,
+although Appendix A (Core) provides definitions for a 7-bit US-ASCII
+environment as has been common to much of the Internet.
+
+By separating external encoding from the syntax, it is intended that
+alternate encoding environments can be used for the same syntax.
+
+3.  OPERATORS
+
+3.1.  Concatenation:  Rule1 Rule2
+
+A rule can define a simple, ordered string of values (i.e., a
+concatenation of contiguous characters) by listing a sequence of rule
+names.  For example:
+
+     foo         =  %x61           ; a
+
+     bar         =  %x62           ; b
+
+     mumble      =  foo bar foo
+
+So that the rule <mumble> matches the lowercase string "aba".
+
+LINEAR WHITE SPACE: Concatenation is at the core of the JBNF parsing
+model.  A string of contiguous characters (values) is parsed
+according to the rules defined in JBNF.  For Internet specifications,
+there is some history of permitting linear white space (space and
+horizontal tab) to be freely and implicitly interspersed around major
+constructs, such as delimiting special characters or atomic strings.
+
+NOTE:
+
+  This specification for JBNF does not provide for implicit
+  specification of linear white space.
+
+Any grammar that wishes to permit linear white space around
+delimiters or string segments must specify it explicitly.  It is
+often useful to provide for such white space in "core" rules that are
+then used variously among higher-level rules.  The "core" rules might
+be formed into a lexical analyzer or simply be part of the main
+ruleset.
+
+3.2.  Alternatives:  Rule1 / Rule2
+
+Elements separated by a forward slash ("/") are alternatives.
+Therefore,
+
+     foo / bar
+
+will accept <foo> or <bar>.
+
+3.3.  Incremental Alternatives: Rule1 =/ Rule2
+
+It is sometimes convenient to specify a list of alternatives in
+fragments.  That is, an initial rule may match one or more
+alternatives, with later rule definitions adding to the set of
+alternatives.  This is particularly useful for otherwise, independent
+specifications that derive from the same parent rule set, such as
+often occurs with parameter lists.  JBNF permits this incremental
+definition through the construct:
+
+     oldrule     =/ additional-alternatives
+
+So that the rule set
+
+     ruleset     =  alt1 / alt2
+
+     ruleset     =/ alt3
+
+     ruleset     =/ alt4 / alt5
+
+is the same as specifying
+
+     ruleset     =  alt1 / alt2 / alt3 / alt4 / alt5
+
+3.4.  Value Range Alternatives:  %c##-##
+
+A range of alternative numeric values can be specified compactly,
+using hyphen ("-") to indicate the range of alternative values.  Hence:
+
+     DIGIT       =  %x30-39
+
+is equivalent to:
+
+     DIGIT       =  %x30 / %x31 / %x32 / %x33 / %x34 / %x35 / %x36 /
+
+                    %x37 / %x38 / %x39
+
+Concatenated numeric values and numeric value ranges cannot be
+specified in the same string.  A numeric value may use the dotted
+notation for concatenation or it may use the hyphen notation to specify
+one value range.  Hence, to specify one printable character between
+end of line sequences, the specification could be:
+
+     char-line = %x0D.0A %x20-7E %x0D.0A
+
+3.5.  Sequence Group:  (Rule1 Rule2)
+
+Elements enclosed in parentheses are treated as a single element,
+whose contents are STRICTLY ORDERED.  Thus,
+
+     elem (foo / bar) blat
+
+matches (elem foo blat) or (elem bar blat), and
+
+     elem foo / bar blat
+
+matches (elem foo) or (bar blat).
+
+NOTE:
+
+  It is strongly advised that grouping notation be used, rather than
+  relying on the proper reading of "bare" alternations, when
+  alternatives consist of multiple rule names or literals.
+
+Hence, it is recommended that the following form be used:
+
+    (elem foo) / (bar blat)
+
+It will avoid misinterpretation by casual readers.
+
+The sequence group notation is also used within free text to set off
+an element sequence from the prose.
+
+3.6.  Variable Repetition:  Rule* and Rule+
+
+The operator "*" following an element indicates repetition;
+the preceding rule matches zero or more times.
+
+The operator "+" following an element indicates repetition;
+the preceding rule matches one or more times.
+
+3.7. JSON value operators
+
+3.7.1 Array:  [Rule]
+
+The operators "[" and "]" surrounding an element indicates a
+JSON array containing that element. If the contained element
+is a concatenation or repetition, the elements within match
+the intermediate value-separator elements required by JSON
+arrays. Thus,
+
+    [string*]
+
+matches ["foo", "bar"] or ["baz"] or [].
+
+3.7.2. Object member:  key: value
+
+Elements connected by a colon operator ":" are treated as a single
+JSON object member. The key MUST be a JSON string and the value
+may be any element. Remember that string literals match surrounding
+DQUOTEs automatically. Thus,
+
+    {elem "foo": string blat}
+
+matches {elem, "foo": "bar", blat}.
+
+3.7.3. Object
+
+The operators "{" and "}" surrounding an element indicates a
+JSON object containing that element. If the contained element
+is a concatenation or repetition, the elements within match
+the intermediate value-separator elements required by JSON
+objects. Thus,
+
+    {("foo": number)*}
+
+matches {"foo": 51, "foo": 13.8} or {"foo": 72} or {}.
+
+3.8. Optional element: elem?
+
+The operator "?" following an element indicates that the element
+is optional; that is, it matches zero or one such element. Thus,
+
+    [string string? number]
+
+matches ["foo", "bar", 3] or ["baz", 12], but not ["a", "b"].
+
+3.9.  Comment:  ; Comment
+
+A semi-colon starts a comment that continues to the end of line.
+This is a simple way of including useful notes in parallel with the
+specifications.
+
+3.10.  Operator Precedence
+
+The various mechanisms described above have the following precedence,
+from highest (binding tightest) at the top, to lowest (loosest) at
+the bottom:
+
+    Strings, Names formation
+
+    Comment
+
+    Value range
+
+    Repetition
+
+    Object member
+
+    Array, Object
+
+    Grouping, Optional
+
+    Concatenation
+
+    Alternative
+
+Use of the alternative operator, freely mixed with concatenations,
+can be confusing.
+
+  Again, it is recommended that the grouping operator be used to
+  make explicit concatenation groups.
+
+4.  JBNF DEFINITION OF JBNF
+
+NOTES:
+
+  1. This syntax requires a formatting of rules that is relatively
+     strict.  Hence, the version of a ruleset included in a
+     specification might need preprocessing to ensure that it can be
+     interpreted by an JBNF parser.
+
+  2. This syntax uses the rules provided in Appendix B (Core).
+
+    rulelist       =  ( rule / (c-wsp* c-nl) )+
+
+    rule           =  rulename defined-as elements c-nl
+                        ; continues if next line starts
+                        ;  with white space
+
+    rulename       =  ALPHA (ALPHA / DIGIT / hyphen)*
+
+    defined-as     =  c-wsp* (equal forward-slash?) c-wsp*
+                        ; basic rules definition and
+                        ;  incremental alternatives
+
+    elements       =  alternation c-wsp*
+
+    c-wsp          =  WSP / (c-nl WSP)
+
+    c-nl           =  comment / CRLF
+                        ; comment or newline
+
+    comment        =  semicolon (WSP / VCHAR)* CRLF
+
+    array          =  left-square-bracket element right-square-bracket
+
+    object         =  left-curly-bracket element right-curly-bracket
+
+    object-member  =  element c-wsp* colon c-wsp* element
+
+    alternation    =  concatenation
+                      (c-wsp* forward-slash c-wsp* concatenation)*
+
+    concatenation  =  repetition (c-wsp+ repetition)*
+
+    repetition     =  element asterisk
+
+    element        =  rulename / group / option /
+                      string / num-val / prose-val
+
+    group          =  left-paren c-wsp* alternation c-wsp* right-paren
+
+    option         =  c-wsp* alternation c-wsp* question-mark
+
+    num-val        =  percent-sign (bin-val / dec-val / hex-val)
+
+    bin-val        =  %x62 BIT+
+                       ((period BIT+)+ / (hyphen BIT+))?
+                        ; series of concatenated bit values
+                        ;  or single ONEOF range
+
+    dec-val        =  %x64 DIGIT+
+                       ((period DIGIT+)+ / (hyphen DIGIT+))?
+
+    hex-val        =  %x78 HEXDIG+
+                       ((period HEXDIG+)+ / (hyphen HEXDIG+))?
+
+    prose-val      =  left-angle-bracket (%x20-3D / %x3F-7E)* right-angle-bracket
+                        ; bracketed string of SP and VCHAR
+                        ;  without angles
+                        ; prose description, to be used as
+                        ;  last resort
+
+5.  SECURITY CONSIDERATIONS
+
+Security is truly believed to be irrelevant to this document.
+
+6.  References
+
+6.1.  Normative References
+
+[US-ASCII] American National Standards Institute, "Coded Character
+          Set -- 7-bit American Standard Code for Information
+          Interchange", ANSI X3.4, 1986.
+
+
+Appendix A.  ACKNOWLEDGEMENTS
+
+Appendix B.  APPENDIX - CORE JBNF OF JBNF
+
+This Appendix is provided as a convenient core for specific grammars.
+The definitions may be used as a core set of rules.
+
+B.1.  Core Rules
+
+Certain basic rules are in uppercase, such as SP, HTAB, CRLF, DIGIT,
+ALPHA, etc.
+
+    ALPHA          =  %x41-5A / %x61-7A   ; A-Z / a-z
+
+    BIT            =  zero / one
+
+    CHAR           =  %x01-7F
+                        ; any 7-bit US-ASCII character,
+                        ;  excluding NUL
+
+    CR             =  %x0D
+                        ; carriage return
+
+    CRLF           =  CR LF
+                        ; Internet standard newline
+
+    CTL            =  %x00-1F / %x7F
+                        ; controls
+
+    DIGIT          =  %x30-39
+                        ; 0-9
+
+    DQUOTE         =  %x22
+                        ; " (Double Quote)
+
+    HEXDIG         =  DIGIT / %x41 / %x42 / %x43 / %x44 / %x45 / %x46
+                        ; 0-9, "ABCDEF"
+
+    HTAB           =  %x09
+                        ; horizontal tab
+
+    LF             =  %x0A
+                        ; linefeed
+
+    LWSP           =  (WSP / CRLF WSP)*
+                        ; linear white space (past newline)
+
+    OCTET          =  %x00-FF
+                        ; 8 bits of data
+
+    SP             =  %x20
+
+    VCHAR          =  %x21-7E
+                        ; visible (printing) characters
+
+    WSP            =  SP / HTAB
+                        ; white space
+
+    percent-sign   =  %x25
+    left-paren     =  %x28
+    right-paren    =  %x29
+    asterisk       =  %x2A
+    hyphen         =  %x2D
+    period         =  %x2E
+    forward-slash  =  %x2F
+    semicolon      =  %x3B
+    equal          =  %x3D
+    question-mark  =  %x3F
+    left-angle-bracket   =  %x3C
+    right-angle-bracket  =  %x3E
+    left-square-bracket  =  %x5B
+    right-square-bracket =  %x5D
+    left-curly-bracket   =  %x7B
+    right-curly-bracket  =  %x7D
+
+    boolean        =  (false / true)
+
+
+B.2.  Common Encoding
+
+Externally, data are represented as "network virtual ASCII" (namely,
+7-bit US-ASCII in an 8-bit field), with the high (8th) bit set to
+zero.  A string of values is in "network byte order", in which the
+higher-valued bytes are represented on the left-hand side and are
+sent over the network first.
+
+Authors' Addresses
+
+Robert Brewer (editor)
+YouGov America Inc
+285 Hamilton Ave, Suite 200
+Palo Alto, CA 94301
+USA
+
+Phone: +1.650.462.8000
+EMail: robert.brewer@yougov.com
+
+Full Copyright Statement
+
+Copyright (C) YouGov (2011).
+
+This document and the information contained herein are provided on an
+"AS IS" basis and THE CONTRIBUTOR, YOUGOV AND ITS SUBSIDIARIES,
+DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT
+LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN
+WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+