Commits

Joe Townsend committed a765a40

adding updated dictionary convention - tightening up the use of when units should be present and when they must not

Comments (0)

Files changed (2)

+relre:^\.idea/*
+relre:.*\.iml$

convention/dictionary-20110408.html

+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+    <title>Chemical Markup Language - Dictionary Convention</title>
+    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
+    <link rel="stylesheet" href="http://www.xml-cml.org/style/cml-spec.css" type="text/css"/>
+</head>
+
+<body>
+<div class="head">
+    <h1>Chemical Markup Language - Dictionary Convention</h1>
+
+    <h2>8 April 2011</h2>
+
+    <dl>
+        <dt><a id="thisVersion" name="thisVersion">This version</a>:</dt>
+        <dd><a href="http://www.xml-cml.org/convention/dictionary-20110408">http://www.xml-cml.org/convention/dictionary-20110408</a>
+        </dd>
+
+        <dt>Latest version:</dt>
+        <dd><a href="http://www.xml-cml.org/convention/dictionary">http://www.xml-cml.org/convention/dictionary</a></dd>
+
+        <dt>Previous version:</dt>
+        <dd><a href="http://www.xml-cml.org/convention/dictionary-20110209">http://www.xml-cml.org/convention/dictionary-20110209</a></dd>
+
+        <dt>Authors:</dt>
+        <dd>See <a href="#acks">acknowledgments</a>.</dd>
+
+        <dt>Editors:</dt>
+        <dd>Sam Adams, University of Cambridge</dd>
+        <dd>Joe Townsend, University of Cambridge</dd>
+    </dl>
+
+    <h2><a name="abstract" id="abstract"></a>Abstract</h2>
+
+    <p>This specification defines the requirements of the Chemical Markup Language dictionary convention.</p>
+
+    <hr/>
+</div>
+
+<h1><a name="contents" id="contents"></a>Table of Contents</h1>
+
+<p class="toc">1. <a href="#introduction">Introduction</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;1.1 <a href="#notational_conventions">Notational Conventions</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;1.2 <a href="#namespaces">Namespaces</a><br/>
+    2. <a href="#applying">Applying the dictionary convention</a><br/>
+    3. <a href="#dictionary_element">Dictionary Element</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;3.1 <a href="#dictionary_namespace">Namespace</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;3.2 <a href="#dictionary-prefix">Prefix</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;3.3 <a href="#dictionary-title">Title</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;3.4 <a href="#dictionary-description">Description</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;3.5<a href="#dictionary-entries">Entries</a><br/>
+    4. <a href="#entry-elements">Entry Elements</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.1 <a href="#entry-id">Id</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.2 <a href="#entry-term">Term</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.3 <a href="#entry-definition">Definition</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.4 <a href="#entry-description">Description</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.5 <a href="#entry-data-type">Data type</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.6 <a href="#entry-unit-type">Unit type</a><br/>
+    &nbsp;&nbsp;&nbsp;&nbsp;4.7 <a href="#entry-default-unit">Default units</a><br/>
+    5. <a href="#example">Example Dictionary</a></p>
+
+<h2>Appendices</h2>
+
+<p>A. <a href="#refs">References</a><br/>
+    B. <a href="#acks">Acknowledgements</a></p>
+
+<hr/>
+
+
+<h1><a name="introduction">1.</a> Introduction</h1>
+
+<p>Dictionaries allow CML to be understood by machines. Much of physical science is managed through the dictionary
+    mechanism. We find terms and units relating to a aspect of science (such as heat of formation, melting point, point
+    group) and create entries for these items in a dictionary.</p>
+
+<p>The entries can consist of just a unique id (within the dictionary's namespace) and some human-understandable
+    definition but we highly encourage more information to be given. For instance what are the units, are there upper
+    and lower bounds, what is the type of the data (string, integer, float etc).</p>
+
+<p>Different programs sometimes produce data with the same label but a different interpretation; does density mean
+    electron density or physical density? Therefore each computational chemistry code will have its own dictionary and
+    then the community can then decide to group particular concepts together.</p>
+
+
+<p>
+    Where concepts are defined by the CML schema they SHOULD NOT be redefined using the dictionary mechanism.
+</p>
+
+<p>
+    Where concepts are defined by standard dictionaries these entries SHOULD be referenced, rather than
+    redefining the concept in another dictionary.
+</p>
+
+
+<h2><a name="notational_conventions">1.1</a> Notational Conventions</h2>
+
+<p>
+    The keywords &quot;MUST&quot;, &quot;MUST NOT&quot;, &quot;REQUIRED&quot;, &quot;SHALL&quot;, &quot;SHALL NOT&quot;,
+    &quot;SHOULD&quot;, &quot;SHOULD NOT&quot;, &quot;RECOMMENDED&quot;, &quot;MAY&quot;, and &quot;OPTIONAL&quot;
+    in this document are to be interpreted as described in RFC 2119 [<cite><a href="#ref-RFC2119">IETF RFC
+    2119</a></cite>].
+</p>
+
+<p>
+    The terms &quot;element&quot;, &quot;attribute&quot;, &quot;child&quot; and &quot;parent&quot;
+    in this document are to be interpreted as described in the W3C Recommendation for Extensible Markup
+    Language (XML) [<cite><a href="#ref-XML">W3C XML</a></cite>].
+</p>
+
+<p>The use of fonts is as follows:</p>
+<ul>
+    <li>Schema terms, including elements and attributes, are written in <code>this font</code>.</li>
+</ul>
+
+<h2><a name="namespaces">1.2</a> Namespaces</h2>
+
+<p>This specification uses the following namespaces and prefixes to indicate those namespaces:</p>
+
+<table class="namespace-table" summary="Namespaces table">
+    <tr>
+        <th>Prefix</th>
+        <th>Namespace URI</th>
+        <th>Description</th>
+    </tr>
+    <tr>
+        <td><code>cml</code></td>
+        <td><code>http://www.xml-cml.org/schema</code></td>
+        <td>Chemical Markup Language elements</td>
+    </tr>
+    <tr>
+        <td><code>convention</code></td>
+        <td><code>http://www.xml-cml.org/convention/</code></td>
+        <td>Standard Chemical Markup Language convention namespace</td>
+    </tr>
+    <tr>
+        <td><code>xhtml</code></td>
+        <td><code>http://www.w3.org/1999/xhtml</code></td>
+        <td>XHTML</td>
+    </tr>
+</table>
+
+
+<h1><a name="applying">2.</a> Applying the dictionary convention</h1>
+
+<p>
+    The dictionary convention MUST be specified using the <code>convention</code>
+    attribute on either a <code>cml</code> or a <code>dictionary</code> element.
+    If the convention is specified on a <code>cml</code> element then that element
+    MUST contain a single child element in the <code>http://www.xml-cml.org/schema</code>
+    namespace, which MUST be a <code>dictionary</code> element.
+</p>
+
+<h1><a name="dictionary_element">3.</a> Dictionary Element</h1>
+
+<h2><a name="dictionary_namespace">3.1</a> Namespace</h2>
+
+<p>
+    The <code>dictionary</code> element MUST have a <code>namespace</code> attribute,
+    the value of which MUST be a valid URI defining the the scope within which the
+    entry terms are unique.
+    The dictionary's namespace URI SHOULD resolve to a representation of the dictionary.
+    The dictionary's namespace URI SHOULD end with either a '/' character or a '#' character
+    so that terms may be referenced by appending them to the URI.
+</p>
+
+
+<h2><a name="dictionary-prefix">3.2</a> Prefix</h2>
+
+<p>
+    The <code>dictionary</code> element SHOULD have a <code>dictionaryPrefix</code>
+    attribute specifying the default prefix to use when referencing dictionary entries.
+    The <code>dictionaryPrefix</code> MUST be a valid XML QName prefix, and SHOULD be
+    unique within the CML domain.
+</p>
+
+
+<h2><a name="dictionary-title">3.3</a> Title</h2>
+
+<p>
+    The <code>dictionary</code> element SHOULD have a <code>title</code>
+    attribute intended for human-readability.
+</p>
+
+
+<h2><a name="dictionary-description">3.4</a> Description</h2>
+
+<p>
+    The <code>dictionary</code> element SHOULD have a single <code>description</code>
+    child element, the contents of which provide a human-readable description of the
+    domain of the dictionary.
+    The <code>description</code> element MUST contain one or more child elements in the
+    <code>http://www.w3.org/1999/xhtml</code> namespace.
+    The <code>description</code> element MUST NOT contain any child elements not in
+    the <code>http://www.w3.org/1999/xhtml</code> namespace.
+</p>
+
+
+<h2><a name="dictionary-entries">3.5</a> Entries</h2>
+
+<p>
+    The <code>dictionary</code> element MUST contain one ore more child <code>entry</code>
+    elements, and MUST not contain any other child elements from the
+    <code>http://www.xml-cml.org/schema</code> namespace.
+</p>
+
+
+<h1><a name="entry-elements">4.</a> Entry Elements</h1>
+
+<h2><a name="entry-id">4.1</a> ID</h2>
+
+<p>
+    An <code>entry</code> element MUST have an <code>id</code> attribute, the
+    value of which MUST be unique within the scope of the dictionary.
+</p>
+
+<p>
+    The value of the <code>id</code> attribute MUST start with a letter, and
+    MUST only contain letters, numbers, dot, hyphen or underscore.
+</p>
+
+<table>
+    <tr>
+        <td><code>IdStartChar</code></td>
+        <td>::=</td>
+        <td><code>[A-Z] | [a-z]</code></td>
+    </tr>
+    <tr>
+        <td><code>IdChar</code></td>
+        <td>::=</td>
+        <td><code>IdStartChar | [0-9] | "." | "-" | "_"</code></td>
+    </tr>
+    <tr>
+        <td><code>Id</code></td>
+        <td>::=</td>
+        <td><code>IdStartChar (IdChar)*</code></td>
+    </tr>
+</table>
+
+<h2><a name="entry-term">4.2</a> Term</h2>
+
+<p>
+    An <code>entry</code> element MUST have a <code>term</code> attribute, the
+    value of which provides a unique nounal phrase linguistically identifying
+    the subject of the entry.
+</p>
+
+<p>
+    The value of the <code>term</code> attribute MAY contain any valid unicode
+    character, however it is RECOMMENDED that any character from outside of
+    the ASCII subset (codepoints 32-127) is represented using an entity reference.
+</p>
+
+
+<h2><a name="entry-definition">4.3</a> Definition</h2>
+
+<p>
+    An <code>entry</code> element MUST contain a single <code>definition</code>
+    child element, the content of which provides a nounal phrase defining of the
+    subject of the entry more verbosely than the term.
+</p>
+
+<p>
+    The <code>definition</code> element MUST contain one or more child elements in the
+    <code>http://www.w3.org/1999/xhtml</code> namespace.
+    The <code>definition</code> element MUST NOT contain any child elements not in
+    the <code>http://www.w3.org/1999/xhtml</code> namespace.
+</p>
+
+
+<h2><a name="entry-description">4.4</a> Description</h2>
+
+<p>
+    An <code>entry</code> element MAY have a single <code>description</code>
+    child element, the content of which provides further information regarding
+    the term, including, but not limited to: examples, human-readable semantics
+    and hyperlinks to other useful resources.
+</p>
+
+<p>
+    The <code>description</code> element MUST contain one or more child elements in the
+    <code>http://www.w3.org/1999/xhtml</code> namespace.
+    The <code>description</code> element MUST NOT contain any child elements not in
+    the <code>http://www.w3.org/1999/xhtml</code> namespace.
+</p>
+
+
+<h2><a name="entry-data-type">4.5</a> Data type</h2>
+
+<p>
+    When applicable to the concept defined, an <code>entry</code> SHOULD have
+    <code>dataType</code> attribute, the value of which is a QName
+    referencing the data type of value defined using the <code>entry</code>.
+</p>
+
+<h3>Common data types:</h3>
+<ul>
+    <li><code>xsd:string</code></li>
+    <li><code>xsd:double</code></li>
+    <li><code>xsd:integer</code></li>
+    <li><code>xsd:boolean</code></li>
+</ul>
+
+
+<h2><a name="entry-unit-type">4.6</a> Unit type</h2>
+
+<p>
+    When applicable to the concept defined, an <code>entry</code> SHOULD have
+    a <code>unitType</code> attribute, the value of which is a QName
+    referencing the unit type (e.g. temperature) of any value defined using
+    the <code>entry</code>.
+</p>
+
+
+<h2><a name="entry-default-unit">4.7</a> Default units</h2>
+
+<p>
+    When applicable to the concept defined, an <code>entry</code> SHOULD have
+    a <code>units</code> attribute, the value of which is a QName
+    referencing the default units (e.g. Kelvin) of any value defined using the
+    <code>entry</code>.
+</p>
+
+<p>
+    If the <code>unitType</code> is expressly given as <a href="http://xml-cml.org/unit/unitType#unknown">unknown</a>
+    then the unit attribute MUST NOT be present.    
+</p>
+
+<p>
+    If the <code>unitType</code> is expressly given as <a href="http://xml-cml.org/unit/unitType#none">none</a>
+    then the unit attribute MUST be present and its value must point to 
+    <a href="http://www.xml-cml.org/unit/si#none">http://www.xml-cml.org/unit/si#none</a>.    
+</p>
+
+
+<h1><a name="example">5.</a> Example Dictionary</h1>
+
+<div class="good">
+<pre>
+&lt;?xml version="1.0" encoding="UTF-8" ?&gt;
+&lt;dictionary xmlns="http://www.xml-cml.org/schema"
+            xmlns:convention="http://www.xml-cml.org/convention/"
+            xmlns:unit="http://www.xml-cml.org/dictionary/unit/"
+            xmlns:unitType="http://www.xml-cml.org/dictionary/unitType/"
+            xmlns:xhtml="http://www.w3.org/1999/xhtml"
+            xmlns:xsd="http://www.w3.org/2001/XMLSchema"
+            convention="convention:dictionary"
+            title="fundamental chemistry concepts"
+            namespace="http://www.xml-cml.org/dictionary/dummy/"
+            dictionaryPrefix="dummy"&gt;
+
+    &lt;entry id="molecmass" term="Molecular Mass"
+           dataType="xsd:double" unitType="unitType:amount" units="unit:amu"&gt;
+        &lt;definition&gt;
+            &lt;xhtml:p&gt;
+                The mass of one mole of a substance in unified atomic mass units.
+            &lt;/xhtml:p&gt;
+        &lt;/definition&gt;
+        &lt;description&gt;
+            &lt;xhtml:p&gt;
+                The molecular mass (m) of a substance is the mass of one molecule of that substance,
+                in unified atomic mass unit(s) u (equal to 1/12 the mass of one atom of the isotope
+                carbon-12). This is numerically equivalent to the relative molecular mass (Mr) of a
+                molecule, frequently referred to by the term molecular weight, which is the ratio of
+                the mass of that molecule to 1/12 of the mass of carbon-12 and is a dimensionless
+                number. Thus, it is incorrect to express relative molecular mass (molecular weight)
+                in daltons (Da). Unfortunately, the terms molecular weight and molecular mass have
+                been confused on numerous websites, which often state that molecular weight was used
+                in the past as another term for molecular mass.
+            &lt;/xhtml:p&gt;
+            &lt;xhtml:p&gt;
+                Molecular mass differs from more common measurements of the mass of chemicals, such
+                as molar mass, by taking into account the isotopic composition of a molecule rather
+                than the average isotopic distribution of many molecules. As a result, molecular mass
+                is a more precise number than molar mass; however it is more accurate to use molar
+                mass on bulk samples. This means that molar mass is appropriate most of the time
+                except when dealing with single molecules.
+            &lt;/xhtml:p&gt;
+        &lt;/description&gt;
+    &lt;/entry&gt;
+
+    &lt;entry id="molarmass" term="Molar Mass"
+           dataType="xsd:double" unitType="unitType:amount" units="unit:amu"&gt;
+        &lt;definition&gt;
+            &lt;xhtml:p&gt;
+                The mass per amount of substance.
+            &lt;/xhtml:p&gt;
+        &lt;/definition&gt;
+        &lt;description&gt;
+            &lt;xhtml:p&gt;
+                Molar mass, symbol M, is a physical property characteristic of a given substance
+                (chemical element or chemical compound), namely its mass per amount of substance.
+                The base SI unit for mass is the kilogram and that for amount of substance is
+                the mole. Thus, the derived unit for molar mass is kg/mol. However, for both
+                practical and historical reasons, molar masses are almost always quoted in grams
+                per mole (g/mol or g mol−1), especially in chemistry.
+            &lt;/xhtml:p&gt;
+            &lt;xhtml:p&gt;
+                Molar mass is closely related to the relative molar mass (Mr) of a compound, the
+                older term formula weight and to the standard atomic masses of its constituent
+                elements. However, it should be distinguished from the molecular mass (also
+                known as molecular weight), which is the mass of one molecule (of any single
+                isotopic composition) and is not directly related to the atomic mass, the mass
+                of one atom (of any single isotope). The dalton, symbol Da, is also sometimes
+                used as a unit of molar mass, especially in biochemistry, with the definition
+                1 Da = 1 g/mol, despite the fact that it is strictly a unit of molecular mass
+                (1 Da = 1.660 538 782(83)×10−27 kg).
+            &lt;/xhtml:p&gt;
+        &lt;/description&gt;
+    &lt;/entry&gt;
+
+&lt;/dictionary&gt;
+</pre>
+</div>
+
+
+<h1><a name="refs">A.</a> References</h1>
+
+<dl>
+
+    <dt>
+        <a name="ref-RFC2119">[RFC2119]</a>
+    </dt>
+    <dd>
+        IETF <cite><a href="http://www.ietf.org/rfc/rfc2119.txt">RFC 2119: Key words for use in RFCs to Indicate
+        Requirement Levels</a></cite>,
+        S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
+    </dd>
+
+    <dt>
+        <a name="ref-XML">[XML]</a>
+    </dt>
+    <dd>
+        <cite><a href="http://www.w3.org/TR/2008/REC-xml-20081126">Extensible Markup Language (XML) 1.0 (Fifth
+            Edition)</a></cite>,
+        T. Bray, J. Paoli, C.M. Sperberg-McQueen E. Maler and F. Yergeau, Editors. World Wide Web Consortium.
+        26 October 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. <a
+            href="http://www.w3.org/TR/REC-xml">latest version of XML</a> is available at http://www.w3.org/TR/REC-xml.
+    </dd>
+
+</dl>
+
+
+<h1><a name="acks">B.</a> Acknowledgements</h1>
+
+<ul>
+    <li>Peter Murray-Rust</li>
+    <li>Joe Townsend</li>
+    <li>Nick England</li>
+    <li>Weerapong Phadungsukanan</li>
+    <li>Daniel Lowe</li>
+    <li>Sam Adams</li>
+    <li>Hannah Barjat</li>
+</ul>
+
+<hr/>
+<div>
+    <a rel="license" href="http://creativecommons.org/licenses/by/3.0/">
+        <img alt="Creative Commons Licence" style="border-widtxhtml:0"
+             src="http://i.creativecommons.org/l/by/3.0/88x31.png"/>
+    </a>
+    <br/>This work is licensed under a<a rel="license" href="http://creativecommons.org/licenses/by/3.0/">Creative
+    Commons
+    Attribution 3.0 Unported License</a>.
+
+
+</div>
+<div>
+    <p>
+        <a href="http://validator.w3.org/check?uri=referer"><img
+                src="http://www.w3.org/Icons/valid-xhtml10"
+                alt="Valid XHTML 1.0 Strict" height="31" width="88"/></a>
+    </p>
+</div>
+</body>
+</html>