Commits

Anonymous committed 33140e6

Add SmartQuotes transform for typographic quotes and dashes.

Comments (0)

Files changed (9)

docutils/COPYING.txt

 that have not yet been invented or conceived.
 
 (This dedication is derived from the text of the `Creative Commons
-Public Domain Dedication
-<http://creativecommons.org/licenses/publicdomain>`_. [#]_)
+Public Domain Dedication`. [#]_)
 
 .. [#] Creative Commons has `retired this legal tool`__ and does not
    recommend that it be applied to works: This tool is based on United
-   States law and may not be applicable outside the US. For dedicating
-   new works to the public domain, Creative Commons recommend CC0_. So
-   does the Free Software Foundation in its license-list_.
+   States law and may not be applicable outside the US. For dedicating new
+   works to the public domain, Creative Commons recommend the replacement
+   Public Domain Dedication CC0_ (CC zero, "No Rights Reserved"). So does
+   the Free Software Foundation in its license-list_.
 
    __  http://creativecommons.org/retiredlicenses
-   .. _CC0: http://creativecommons.org/publicdomain/zero/1.0/legalcode
+   .. _CC0: http://creativecommons.org/about/cc0
 
 Exceptions
 ==========
       <http://www.twinhelix.com>.  Free usage permitted as long as
       this notice remains intact.
 
-* docutils/math/__init__.py,
-  docutils/math/latex2mathml.py,
+* docutils/utils/math/__init__.py,
+  docutils/utils/math/latex2mathml.py,
   docutils/writers/xetex/__init__.py,
   docutils/writers/latex2e/docutils-05-compat.sty,
   docs/user/docutils-05-compat.sty.txt,
-  docutils/error_reporting.py:
+  docutils/utils/error_reporting.py,
+  docutils/test/transforms/test_smartquotes.py:
 
   Copyright © Günter Milde.
   Released under the terms of the `2-Clause BSD license`_
   (`local copy <licenses/BSD-2-Clause.txt>`__).
 
-* docutils/math/math2html.py,
+* docutils/utils/smartquotes.py
+
+  Copyright © 2011 Günter Milde,
+  based on `SmartyPants`_ © 2003 John Gruber
+  (released under a 3-Clause BSD license included in the file)
+  and smartypants.py © 2004, 2007 Chad Miller.
+  Released under the terms of the `2-Clause BSD license`_
+  (`local copy <licenses/BSD-2-Clause.txt>`__).
+
+  .. _SmartyPants: http://daringfireball.net/projects/smartypants/
+
+* docutils/utils/math/math2html.py,
   docutils/writers/html4css1/math.css
 
   Copyright © Alex Fernández
    sentence-end-double-space: t
    fill-column: 70
    End:
+
+.. Here's a code css to make a table colourful::
+
+   /* Table: */
+   
+   th {
+       background-color: #ede;
+   }
+   
+   /* alternating colors in table rows */
+   table.docutils tr:nth-child(even) {
+       background-color: #F3F3FF;
+   }
+   table.docutils tr:nth-child(odd) {
+       background-color: #FFFFEE;
+   }
+   
+   table.docutils tr {
+       border-style: solid none solid none;
+       border-width: 1px 0 1px 0;
+       border-color: #AAAAAA;
+   }  

docutils/HISTORY.txt

 
   - Fix [ 3546533 ] Unicode error with `date` directive.
 
-* docutils/setup.py
-
-  - Add ``math.css`` stylesheet to data files (thanks to Dmitry Shachnev).
+* docutils/transforms/universal.py
+
+  - Add SmartQuotes transform for typographic quotes and dashes.
 
 * docutils/writers/html4css1/__init__.py
 
 
   - Fix [ 3552403 ] Prevent broken PyXML replacing stdlibs xml module.
 
-* docutils/tools/test/test_buildhtml.py
+* setup.py
+
+  - Tag ``math.css`` stylesheet as data file (patch by Dmitry Shachnev).
+
+* tools/test/test_buildhtml.py
 
   - Fix [ 3521167 ] allow running in any directory.
   - Fix [ 3521168 ] allow running with Python 3.
 Release 0.9.1 (2012-06-17)
 ==========================
 
-* docutils/setup.py
+* setup.py
 
   - Fix [ 3527842 ]. Under Python 3, converted tests and tools were
     installed in the PYTHONPATH. Converted tests are now
     ``setup.py install`` under Python 3, remove the spurious
     ``test/`` and ``tools/`` directories in the site library root.
 
-* docutils/test/
+* test/
 
   - Make tests independent from the location of the ``test/`` directory.
   - Use converted sources (from the ``build/`` directory) for tests under
     Python 3.
 
-* docutils/tools/
+* tools/
 
   - Make tools compatible with both, Python 2 and 3 without 2to3-conversion.
 
   - Fix [ 3525847 ]. Catch and report UnicodeEncodeError with
     ``locale == C`` and 8-bit char in path argument of `include` directive.
 
-* docutils/test/alltests.py
+* test/alltests.py
 
   - class `Tee`: catch UnicodeError when writing to "ascii" stream or
     file under Python 3.
   - Fix [ 3364658 ] (Change last file with Apache license to BSD-2-Clause)
     and [ 3395920 ] (correct copyright info for rst.el).
 
-* docutils/test/
+* test/
 
   -  Apply [ 3303733 ] and [ 3365041 ] to fix tests under Py3k.
 

docutils/docs/dev/todo.txt

            supports Zotero databases and CSL_ styles with Docutils with an
            ``xcite`` role.
 
-         .. _CSL: http://www.citationstyles.org/
-
+         * `sphinxcontrib-bibtex`_ Sphinx extension with "bibliography"
+           directive and "cite" role supporting BibTeX databases.
 
     * Automatically insert a "References" heading?
 
 .. _CrossTeX: http://www.cs.cornell.edu/people/egs/crosstex/
 .. _Pybtex:   http://pybtex.sourceforge.net/
+.. _CSL: http://www.citationstyles.org/
+.. _sphinxcontrib-bibtex: http://sphinxcontrib-bibtex.readthedocs.org/
 
 * _`Reference Merging`
 
 
 * _`Index Generation`
 
-* _`Beautify`
-
-  Convert quotes and dashes to typographically correct entities.
-  Sphinx does this with ``smartypants.py``.
-
-  Write a generic version that uses Unicode chars
-  (let the writer replace these if required).
-
-  Some arguments for "smart quotes" are collected in a `mail to
-  docutils-user by Jörg W. Mittag from 2006-03-13`__.
-
-  Also see the "... Or Not To Do?" list entry for
-  `Character Processing`_
-
-__ http://article.gmane.org/gmane.text.docutils.user/2765
-
-.. _Character Processing: rst/alternatives.html#character-processing
-
 
 HTML Writer
 ===========

docutils/docs/ref/transforms.txt

 
 .. contents::
 
+Transforms change the document tree in-place, add to the tree, or prune it.
+Transforms resolve references and footnote numbers, process interpreted
+text, and do other context-sensitive processing. Each transform is a
+subclass of ``docutils.tranforms.Transform``.
 
-For background about transforms and the Transformer object, see `PEP
-258`_.
+There are `transforms added by components`_, others (e.g.
+``parts.Contents``) are added by the parser, if a corresponding directive is
+found in the document.
+
+To add a transform, components (objects inheriting from
+Docutils.Component like Readers, Parsers, Writers, Input, Output) overwrite
+the ``get_transforms()`` method of their base class. After the Reader has
+finished processing, the Publisher calls
+``Transformer.populate_from_components()`` with a list of components and all
+transforms returned by the component's ``get_transforms()`` method are
+stored in a `transformer object` attached to the document tree.
+
+
+For more about transforms and the Transformer object, see also `PEP
+258`_. (The ``default_transforms()`` attribute of component classes mentioned
+there is deprecated. Use the ``get_transforms()`` method instead.)
 
 .. _PEP 258: ../peps/pep-0258.html#transformer
 
 Transforms Listed in Priority Order
 ===================================
 
+Transform classes each have a default_priority attribute which is used by
+the Transformer to apply transforms in order (low to high). The default
+priority can be overridden when adding transforms to the Transformer object.
+
+
 ==============================  ============================  ========
 Transform: module.Class         Added By                      Priority
 ==============================  ============================  ========
 
 universal.FilterMessages        Writer (w)                    870
 
+universal.SmartQuotes           Parser                        850
+
 universal.TestMessages          DocutilsTestSupport           880
 
 writer_aux.Compound             newlatex2e (w)                910
 
 writer_aux.Admonitions          html4css1 (w),                920
-                                newlatex2e (w)
+                                latex2e (w)
 
 misc.CallBack                   n/a                           990
 ==============================  ============================  ========
  800   899  very late
  900   999  very late (non-standard)
 ====  ====  ================================================
+
+
+Transforms added by components
+===============================
+
+
+readers.Reader:
+  | universal.Decorations,
+  | universal.ExposeInternals,
+  | universal.StripComments
+
+readers.ReReader:
+  None
+
+readers.standalone.Reader:
+  | references.Substitutions,
+  | references.PropagateTargets,
+  | frontmatter.DocTitle,
+  | frontmatter.SectionSubTitle,
+  | frontmatter.DocInfo,
+  | references.AnonymousHyperlinks,
+  | references.IndirectHyperlinks,
+  | references.Footnotes,
+  | references.ExternalTargets,
+  | references.InternalTargets,
+  | references.DanglingReferences,
+  | misc.Transitions
+
+readers.pep.Reader:
+  | references.Substitutions,
+  | references.PropagateTargets,
+  | references.AnonymousHyperlinks,
+  | references.IndirectHyperlinks,
+  | references.Footnotes,
+  | references.ExternalTargets,
+  | references.InternalTargets,
+  | references.DanglingReferences,
+  | misc.Transitions,
+  | peps.Headers,
+  | peps.Contents,
+  | peps.TargetNotes
+
+parsers.rst.Parser
+  universal.SmartQuotes
+
+writers.Writer:
+  | universal.Messages,
+  | universal.FilterMessages,
+  | universal.StripClassesAndElements
+
+writers.UnfilteredWriter
+  None
+
+writers.latex2e.Writer
+  writer_aux.Admonitions
+
+writers.html4css1.Writer:
+  writer_aux.Admonitions
+
+writers.odf_odt.Writer:
+  removes references.DanglingReferences

docutils/docutils/parsers/rst/__init__.py

 import docutils.parsers
 import docutils.statemachine
 from docutils.parsers.rst import states
-from docutils import frontend, nodes
+from docutils import frontend, nodes, Component
+from docutils.transforms import universal
 
 
 class Parser(docutils.parsers.Parser):
           '"long", "short", or "none (no parsing)". Default is "short".',
           ['--syntax-highlight'],
           {'choices': ['long', 'short', 'none'],
-           'default': 'long', 'metavar': '<format>'}),))
+           'default': 'long', 'metavar': '<format>'}),
+         ('Change straight quotation marks to typographic form: '
+          'one of "yes", "no", "alt[ernative]" (default "no").',
+          ['--smart-quotes'],
+          {'default': False, 'validator': frontend.validate_ternary}),
+        ))
 
     config_section = 'restructuredtext parser'
     config_section_dependencies = ('parsers',)
         self.state_classes = states.state_classes
         self.inliner = inliner
 
+    def get_transforms(self):
+        return Component.get_transforms(self) + [
+            universal.SmartQuotes]
+
     def parse(self, inputstring, document):
         """Parse `inputstring` and populate `document`, a document tree."""
         self.setup_parse(inputstring, document)
         and the line number added.
 
         Preferably use the `debug`, `info`, `warning`, `error`, or `severe`
-        wrapper methods, e.g. ``self.error(message)`` to generate an 
+        wrapper methods, e.g. ``self.error(message)`` to generate an
         ERROR-level directive error.
         """
         return DirectiveError(level, message)

docutils/docutils/transforms/universal.py

 import time
 from docutils import nodes, utils
 from docutils.transforms import TransformError, Transform
-
+from docutils.utils import smartquotes
 
 class Decorations(Transform):
 
                     node['classes'].remove(class_value)
                 if class_value in self.strip_elements:
                     return 1
+
+class SmartQuotes(Transform):
+
+    """
+    Replace ASCII quotation marks with typographic form.
+
+    Also replace multiple dashes with em-dashes and en-dashes.
+    """
+
+    default_priority = 850
+
+    def apply(self):
+        if self.document.settings.smart_quotes is False:
+            return
+        for node in self.document.traverse(nodes.Text):
+            if isinstance(node.parent,
+                          (nodes.FixedTextElement, nodes.literal)):
+                # print "literal", node
+                continue
+            newtext = smartquotes.smartyPants(node.astext(), attr='2')
+            node.parent.replace(node, nodes.Text(newtext))
+            # print "smartquote",

docutils/docutils/utils/smartquotes.py

+#!/usr/bin/python
+# -*- coding: utf8 -*-
+
+# :Id: $Id$
+# :Copyright: © 2011 Günter Milde,
+#             original `SmartyPants`_: © 2003 John Gruber
+#             smartypants.py:          © 2004, 2007 Chad Miller
+# :License: Released under the terms of the `2-Clause BSD license`_, in short:
+#
+#    Copying and distribution of this file, with or without modification,
+#    are permitted in any medium without royalty provided the copyright
+#    notices and this notice are preserved.
+#    This file is offered as-is, without any warranty.
+#
+# .. _2-Clause BSD license: http://www.spdx.org/licenses/BSD-2-Clause
+
+
+r"""
+========================
+SmartyPants for Docutils
+========================
+
+Synopsis
+========
+
+Smart-quotes for Docutils.
+
+The original "SmartyPants" is a free web publishing plug-in for Movable Type,
+Blosxom, and BBEdit that easily translates plain ASCII punctuation characters
+into "smart" typographic punctuation characters.
+
+`smartypants.py`, endeavours to be a functional port of
+SmartyPants to Python, for use with Pyblosxom_.
+
+`smartquotes.py` is an adaption of Smartypants to Docutils_. By using Unicode
+characters instead of HTML entities for typographic quotes, it works for any
+output format that supports Unicode.
+
+Authors
+=======
+
+`John Gruber`_ did all of the hard work of writing this software in Perl for
+`Movable Type`_ and almost all of this useful documentation.  `Chad Miller`_
+ported it to Python to use with Pyblosxom_.
+Adapted to Docutils_ by Günter Milde
+
+Additional Credits
+==================
+
+Portions of the SmartyPants original work are based on Brad Choate's nifty
+MTRegex plug-in.  `Brad Choate`_ also contributed a few bits of source code to
+this plug-in.  Brad Choate is a fine hacker indeed.
+
+`Jeremy Hedley`_ and `Charles Wiltgen`_ deserve mention for exemplary beta
+testing of the original SmartyPants.
+
+`Rael Dornfest`_ ported SmartyPants to Blosxom.
+
+.. _Brad Choate: http://bradchoate.com/
+.. _Jeremy Hedley: http://antipixel.com/
+.. _Charles Wiltgen: http://playbacktime.com/
+.. _Rael Dornfest: http://raelity.org/
+
+
+Copyright and License
+=====================
+
+SmartyPants_ license (3-Clause BSD license):
+
+  Copyright (c) 2003 John Gruber (http://daringfireball.net/)
+  All rights reserved.
+
+  Redistribution and use in source and binary forms, with or without
+  modification, are permitted provided that the following conditions are
+  met:
+
+  * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+
+  * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+
+  * Neither the name "SmartyPants" nor the names of its contributors
+    may be used to endorse or promote products derived from this
+    software without specific prior written permission.
+
+  This software is provided by the copyright holders and contributors
+  "as is" and any express or implied warranties, including, but not
+  limited to, the implied warranties of merchantability and fitness for
+  a particular purpose are disclaimed. In no event shall the copyright
+  owner or contributors be liable for any direct, indirect, incidental,
+  special, exemplary, or consequential damages (including, but not
+  limited to, procurement of substitute goods or services; loss of use,
+  data, or profits; or business interruption) however caused and on any
+  theory of liability, whether in contract, strict liability, or tort
+  (including negligence or otherwise) arising in any way out of the use
+  of this software, even if advised of the possibility of such damage.
+
+smartypants.py license (2-Clause BSD license):
+
+  smartypants.py is a derivative work of SmartyPants.
+
+  Redistribution and use in source and binary forms, with or without
+  modification, are permitted provided that the following conditions are
+  met:
+
+  * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+
+  * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+
+  This software is provided by the copyright holders and contributors
+  "as is" and any express or implied warranties, including, but not
+  limited to, the implied warranties of merchantability and fitness for
+  a particular purpose are disclaimed. In no event shall the copyright
+  owner or contributors be liable for any direct, indirect, incidental,
+  special, exemplary, or consequential damages (including, but not
+  limited to, procurement of substitute goods or services; loss of use,
+  data, or profits; or business interruption) however caused and on any
+  theory of liability, whether in contract, strict liability, or tort
+  (including negligence or otherwise) arising in any way out of the use
+  of this software, even if advised of the possibility of such damage.
+
+.. _John Gruber: http://daringfireball.net/
+.. _Chad Miller: http://web.chad.org/
+
+.. _Pyblosxom: http://pyblosxom.bluesock.org/
+.. _SmartyPants: http://daringfireball.net/projects/smartypants/
+.. _Movable Type: http://www.movabletype.org/
+.. _2-Clause BSD license: http://www.spdx.org/licenses/BSD-2-Clause
+.. _Docutils: http://docutils.sf.net/
+
+Description
+===========
+
+SmartyPants can perform the following transformations:
+
+- Straight quotes ( " and ' ) into "curly" quote characters
+- Backticks-style quotes (\`\`like this'') into "curly" quote characters
+- Dashes (``--`` and ``---``) into en- and em-dash entities
+- Three consecutive dots (``...`` or ``. . .``) into an ellipsis entity
+
+This means you can write, edit, and save your posts using plain old
+ASCII straight quotes, plain dashes, and plain dots, but your published
+posts (and final HTML output) will appear with smart quotes, em-dashes,
+and proper ellipses.
+
+SmartyPants does not modify characters within ``<pre>``, ``<code>``, ``<kbd>``,
+``<math>`` or ``<script>`` tag blocks. Typically, these tags are used to
+display text where smart quotes and other "smart punctuation" would not be
+appropriate, such as source code or example markup.
+
+
+Backslash Escapes
+=================
+
+If you need to use literal straight quotes (or plain hyphens and
+periods), SmartyPants accepts the following backslash escape sequences
+to force non-smart punctuation. It does so by transforming the escape
+sequence into a character:
+
+========  =====  =========
+Escape    Value  Character
+========  =====  =========
+``\\\\``  &#92;  \\
+\\"       &#34;  "
+\\'       &#39;  '
+\\.       &#46;  .
+\\-       &#45;  \-
+\\`       &#96;  \`
+========  =====  =========
+
+This is useful, for example, when you want to use straight quotes as
+foot and inch marks: 6'2" tall; a 17" iMac.
+
+Options
+=======
+
+For Pyblosxom users, the ``smartypants_attributes`` attribute is where you
+specify configuration options.
+
+Numeric values are the easiest way to configure SmartyPants' behavior:
+
+"0"
+        Suppress all transformations. (Do nothing.)
+"1"
+        Performs default SmartyPants transformations: quotes (including
+        \`\`backticks'' -style), em-dashes, and ellipses. "``--``" (dash dash)
+        is used to signify an em-dash; there is no support for en-dashes.
+
+"2"
+        Same as smarty_pants="1", except that it uses the old-school typewriter
+        shorthand for dashes:  "``--``" (dash dash) for en-dashes, "``---``"
+        (dash dash dash)
+        for em-dashes.
+
+"3"
+        Same as smarty_pants="2", but inverts the shorthand for dashes:
+        "``--``" (dash dash) for em-dashes, and "``---``" (dash dash dash) for
+        en-dashes.
+
+"-1"
+        Stupefy mode. Reverses the SmartyPants transformation process, turning
+        the characters produced by SmartyPants into their ASCII equivalents.
+        E.g.  "“" is turned into a simple double-quote ("), "—" is
+        turned into two dashes, etc.
+
+
+The following single-character attribute values can be combined to toggle
+individual transformations from within the smarty_pants attribute. For
+example, to educate normal quotes and em-dashes, but not ellipses or
+\`\`backticks'' -style quotes:
+
+``py['smartypants_attributes'] = "1"``
+
+"q"
+        Educates normal quote characters: (") and (').
+
+"b"
+        Educates \`\`backticks'' -style double quotes.
+
+"B"
+        Educates \`\`backticks'' -style double quotes and \`single' quotes.
+
+"d"
+        Educates em-dashes.
+
+"D"
+        Educates em-dashes and en-dashes, using old-school typewriter shorthand:
+        (dash dash) for en-dashes, (dash dash dash) for em-dashes.
+
+"i"
+        Educates em-dashes and en-dashes, using inverted old-school typewriter
+        shorthand: (dash dash) for em-dashes, (dash dash dash) for en-dashes.
+
+"e"
+        Educates ellipses.
+
+"w"
+        Translates any instance of ``&quot;`` into a normal double-quote character.
+        This should be of no interest to most people, but of particular interest
+        to anyone who writes their posts using Dreamweaver, as Dreamweaver
+        inexplicably uses this entity to represent a literal double-quote
+        character. SmartyPants only educates normal quotes, not entities (because
+        ordinarily, entities are used for the explicit purpose of representing the
+        specific character they represent). The "w" option must be used in
+        conjunction with one (or both) of the other quote options ("q" or "b").
+        Thus, if you wish to apply all SmartyPants transformations (quotes, en-
+        and em-dashes, and ellipses) and also translate ``&quot;`` entities into
+        regular quotes so SmartyPants can educate them, you should pass the
+        following to the smarty_pants attribute:
+
+
+Caveats
+=======
+
+Why You Might Not Want to Use Smart Quotes in Your Weblog
+---------------------------------------------------------
+
+For one thing, you might not care.
+
+Most normal, mentally stable individuals do not take notice of proper
+typographic punctuation. Many design and typography nerds, however, break
+out in a nasty rash when they encounter, say, a restaurant sign that uses
+a straight apostrophe to spell "Joe's".
+
+If you're the sort of person who just doesn't care, you might well want to
+continue not caring. Using straight quotes -- and sticking to the 7-bit
+ASCII character set in general -- is certainly a simpler way to live.
+
+Even if you I *do* care about accurate typography, you still might want to
+think twice before educating the quote characters in your weblog. One side
+effect of publishing curly quote characters is that it makes your
+weblog a bit harder for others to quote from using copy-and-paste. What
+happens is that when someone copies text from your blog, the copied text
+contains the 8-bit curly quote characters (as well as the 8-bit characters
+for em-dashes and ellipses, if you use these options). These characters
+are not standard across different text encoding methods, which is why they
+need to be encoded as characters.
+
+People copying text from your weblog, however, may not notice that you're
+using curly quotes, and they'll go ahead and paste the unencoded 8-bit
+characters copied from their browser into an email message or their own
+weblog. When pasted as raw "smart quotes", these characters are likely to
+get mangled beyond recognition.
+
+That said, my own opinion is that any decent text editor or email client
+makes it easy to stupefy smart quote characters into their 7-bit
+equivalents, and I don't consider it my problem if you're using an
+indecent text editor or email client.
+
+
+Algorithmic Shortcomings
+------------------------
+
+One situation in which quotes will get curled the wrong way is when
+apostrophes are used at the start of leading contractions. For example:
+
+``'Twas the night before Christmas.``
+
+In the case above, SmartyPants will turn the apostrophe into an opening
+single-quote, when in fact it should be a closing one. I don't think
+this problem can be solved in the general case -- every word processor
+I've tried gets this wrong as well. In such cases, it's best to use the
+proper character for closing single-quotes (``’``) by hand.
+
+
+Version History
+===============
+
+1.6:    2010-08-26
+        - Adaption to Docutils:
+          - Use Unicode instead of HTML entities,
+          - Remove code special to pyblosxom.
+
+1.5_1.6: Fri, 27 Jul 2007 07:06:40 -0400
+        - Fixed bug where blocks of precious unalterable text was instead
+          interpreted.  Thanks to Le Roux and Dirk van Oosterbosch.
+
+1.5_1.5: Sat, 13 Aug 2005 15:50:24 -0400
+        - Fix bogus magical quotation when there is no hint that the
+          user wants it, e.g., in "21st century".  Thanks to Nathan Hamblen.
+        - Be smarter about quotes before terminating numbers in an en-dash'ed
+          range.
+
+1.5_1.4: Thu, 10 Feb 2005 20:24:36 -0500
+        - Fix a date-processing bug, as reported by jacob childress.
+        - Begin a test-suite for ensuring correct output.
+        - Removed import of "string", since I didn't really need it.
+          (This was my first every Python program.  Sue me!)
+
+1.5_1.3: Wed, 15 Sep 2004 18:25:58 -0400
+        - Abort processing if the flavour is in forbidden-list.  Default of
+          [ "rss" ]   (Idea of Wolfgang SCHNERRING.)
+        - Remove stray virgules from en-dashes.  Patch by Wolfgang SCHNERRING.
+
+1.5_1.2: Mon, 24 May 2004 08:14:54 -0400
+        - Some single quotes weren't replaced properly.  Diff-tesuji played
+          by Benjamin GEIGER.
+
+1.5_1.1: Sun, 14 Mar 2004 14:38:28 -0500
+        - Support upcoming pyblosxom 0.9 plugin verification feature.
+
+1.5_1.0: Tue, 09 Mar 2004 08:08:35 -0500
+        - Initial release
+"""
+
+default_smartypants_attr = "1"
+
+import re
+
+class smart(object):
+    """Smart quotes and dashes
+
+    TODO: internationalization, see e.g.
+    http://de.wikipedia.org/wiki/Anf%C3%BChrungszeichen#Andere_Sprachen
+    """
+    endash   = u'–' # "&#8211;" EN DASH
+    emdash   = u'—' # "&#8212;" EM DASH
+    lquote   = u'‘' # "&#8216;" LEFT SINGLE QUOTATION MARK
+    rquote   = u'’' # "&#8217;" RIGHT SINGLE QUOTATION MARK
+    #lquote  = u'‚' # "&#8218;" SINGLE LOW-9 QUOTATION MARK (German)
+    ldquote  = u'“' # "&#8220;" LEFT DOUBLE QUOTATION MARK
+    rdquote  = u'”' # "&#8221;" RIGHT DOUBLE QUOTATION MARK
+    #ldquote = u'„' # "&#82212" DOUBLE LOW-9 QUOTATION MARK (German)
+    ellipsis = u'…' # "&#8230;" HORIZONTAL ELLIPSIS
+
+def smartyPants(text, attr=default_smartypants_attr):
+    convert_quot = False  # translate &quot; entities into normal quotes?
+
+    # Parse attributes:
+    # 0 : do nothing
+    # 1 : set all
+    # 2 : set all, using old school en- and em- dash shortcuts
+    # 3 : set all, using inverted old school en and em- dash shortcuts
+    #
+    # q : quotes
+    # b : backtick quotes (``double'' only)
+    # B : backtick quotes (``double'' and `single')
+    # d : dashes
+    # D : old school dashes
+    # i : inverted old school dashes
+    # e : ellipses
+    # w : convert &quot; entities to " for Dreamweaver users
+
+    skipped_tag_stack = []
+    do_dashes = "0"
+    do_backticks = "0"
+    do_quotes = "0"
+    do_ellipses = "0"
+    do_stupefy = "0"
+
+    if attr == "0":
+        # Do nothing.
+        return text
+    elif attr == "1":
+        do_quotes    = "1"
+        do_backticks = "1"
+        do_dashes    = "1"
+        do_ellipses  = "1"
+    elif attr == "2":
+        # Do everything, turn all options on, use old school dash shorthand.
+        do_quotes    = "1"
+        do_backticks = "1"
+        do_dashes    = "2"
+        do_ellipses  = "1"
+    elif attr == "3":
+        # Do everything, turn all options on, use inverted old school dash shorthand.
+        do_quotes    = "1"
+        do_backticks = "1"
+        do_dashes    = "3"
+        do_ellipses  = "1"
+    elif attr == "-1":
+        # Special "stupefy" mode.
+        do_stupefy   = "1"
+    else:
+        for c in attr:
+            if c == "q": do_quotes = "1"
+            elif c == "b": do_backticks = "1"
+            elif c == "B": do_backticks = "2"
+            elif c == "d": do_dashes = "1"
+            elif c == "D": do_dashes = "2"
+            elif c == "i": do_dashes = "3"
+            elif c == "e": do_ellipses = "1"
+            elif c == "w": convert_quot = "1"
+            else:
+                pass
+                # ignore unknown option
+
+    tokens = _tokenize(text)
+    result = []
+    in_pre = False
+
+    prev_token_last_char = ""
+    # This is a cheat, used to get some context
+    # for one-character tokens that consist of
+    # just a quote char. What we do is remember
+    # the last character of the previous text
+    # token, to use as context to curl single-
+    # character quote tokens correctly.
+
+    for cur_token in tokens:
+        t = cur_token[1]
+        last_char = t[-1:] # Remember last char of this token before processing.
+        if not in_pre:
+            oldstr = t
+            t = processEscapes(t)
+
+            if convert_quot != "0":
+                t = re.sub('&quot;', '"', t)
+
+            if do_dashes != "0":
+                if do_dashes == "1":
+                    t = educateDashes(t)
+                if do_dashes == "2":
+                    t = educateDashesOldSchool(t)
+                if do_dashes == "3":
+                    t = educateDashesOldSchoolInverted(t)
+
+            if do_ellipses != "0":
+                t = educateEllipses(t)
+
+            # Note: backticks need to be processed before quotes.
+            if do_backticks != "0":
+                t = educateBackticks(t)
+
+            if do_backticks == "2":
+                t = educateSingleBackticks(t)
+
+            if do_quotes != "0":
+                if t == "'":
+                    # Special case: single-character ' token
+                    if re.match("\S", prev_token_last_char):
+                        t = smart.rquote
+                    else:
+                        t = smart.lquote
+                elif t == '"':
+                    # Special case: single-character " token
+                    if re.match("\S", prev_token_last_char):
+                        t = smart.rdquote
+                    else:
+                        t = smart.ldquote
+
+                else:
+                    # Normal case:
+                    t = educateQuotes(t)
+
+            if do_stupefy == "1":
+                t = stupefyEntities(t)
+
+        prev_token_last_char = last_char
+        result.append(t)
+
+    return "".join(result)
+
+
+def educateQuotes(str):
+    """
+    Parameter:  String.
+
+    Returns:        The string, with "educated" curly quote characters.
+
+    Example input:  "Isn't this fun?"
+    Example output: “Isn’t this fun?“;
+    """
+
+    oldstr = str
+    punct_class = r"""[!"#\$\%'()*+,-.\/:;<=>?\@\[\\\]\^_`{|}~]"""
+
+    # Special case if the very first character is a quote
+    # followed by punctuation at a non-word-break. Close the quotes by brute force:
+    str = re.sub(r"""^'(?=%s\\B)""" % (punct_class,), smart.rquote, str)
+    str = re.sub(r"""^"(?=%s\\B)""" % (punct_class,), smart.rdquote, str)
+
+    # Special case for double sets of quotes, e.g.:
+    #   <p>He said, "'Quoted' words in a larger quote."</p>
+    str = re.sub(r""""'(?=\w)""", smart.ldquote+smart.lquote, str)
+    str = re.sub(r"""'"(?=\w)""", smart.lquote+smart.ldquote, str)
+
+    # Special case for decade abbreviations (the '80s):
+    str = re.sub(r"""\b'(?=\d{2}s)""", smart.rquote, str)
+
+    close_class = r"""[^\ \t\r\n\[\{\(\-]"""
+    dec_dashes = r"""&#8211;|&#8212;"""
+
+    # Get most opening single quotes:
+    opening_single_quotes_regex = re.compile(r"""
+                    (
+                            \s          |   # a whitespace char, or
+                            &nbsp;      |   # a non-breaking space entity, or
+                            --          |   # dashes, or
+                            &[mn]dash;  |   # named dash entities
+                            %s          |   # or decimal entities
+                            &\#x201[34];    # or hex
+                    )
+                    '                 # the quote
+                    (?=\w)            # followed by a word character
+                    """ % (dec_dashes,), re.VERBOSE)
+    str = opening_single_quotes_regex.sub(r'\1'+smart.lquote, str)
+
+    closing_single_quotes_regex = re.compile(r"""
+                    (%s)
+                    '
+                    (?!\s | s\b | \d)
+                    """ % (close_class,), re.VERBOSE)
+    str = closing_single_quotes_regex.sub(r'\1'+smart.rquote, str)
+
+    closing_single_quotes_regex = re.compile(r"""
+                    (%s)
+                    '
+                    (\s | s\b)
+                    """ % (close_class,), re.VERBOSE)
+    str = closing_single_quotes_regex.sub(r'\1%s\2' % smart.rquote, str)
+
+    # Any remaining single quotes should be opening ones:
+    str = re.sub(r"""'""", smart.lquote, str)
+
+    # Get most opening double quotes:
+    opening_double_quotes_regex = re.compile(r"""
+                    (
+                            \s          |   # a whitespace char, or
+                            &nbsp;      |   # a non-breaking space entity, or
+                            --          |   # dashes, or
+                            &[mn]dash;  |   # named dash entities
+                            %s          |   # or decimal entities
+                            &\#x201[34];    # or hex
+                    )
+                    "                 # the quote
+                    (?=\w)            # followed by a word character
+                    """ % (dec_dashes,), re.VERBOSE)
+    str = opening_double_quotes_regex.sub(r'\1'+smart.ldquote, str)
+
+    # Double closing quotes:
+    closing_double_quotes_regex = re.compile(r"""
+                    #(%s)?   # character that indicates the quote should be closing
+                    "
+                    (?=\s)
+                    """ % (close_class,), re.VERBOSE)
+    str = closing_double_quotes_regex.sub(smart.rdquote, str)
+
+    closing_double_quotes_regex = re.compile(r"""
+                    (%s)   # character that indicates the quote should be closing
+                    "
+                    """ % (close_class,), re.VERBOSE)
+    str = closing_double_quotes_regex.sub(r'\1'+smart.rdquote, str)
+
+    # Any remaining quotes should be opening ones.
+    str = re.sub(r'"', smart.ldquote, str)
+
+    return str
+
+
+def educateBackticks(str):
+    """
+    Parameter:  String.
+    Returns:    The string, with ``backticks'' -style double quotes
+                translated into HTML curly quote entities.
+    Example input:  ``Isn't this fun?''
+    Example output: “Isn't this fun?“;
+    """
+
+    str = re.sub(r"""``""", smart.ldquote, str)
+    str = re.sub(r"""''""", smart.rdquote, str)
+    return str
+
+
+def educateSingleBackticks(str):
+    """
+    Parameter:  String.
+    Returns:    The string, with `backticks' -style single quotes
+                translated into HTML curly quote entities.
+
+    Example input:  `Isn't this fun?'
+    Example output: ‘Isn’t this fun?’
+    """
+
+    str = re.sub(r"""`""", smart.lquote, str)
+    str = re.sub(r"""'""", smart.rquote, str)
+    return str
+
+
+def educateDashes(str):
+    """
+    Parameter:  String.
+
+    Returns:    The string, with each instance of "--" translated to
+                an em-dash character.
+    """
+
+    str = re.sub(r"""---""", smart.endash, str) # en  (yes, backwards)
+    str = re.sub(r"""--""", smart.emdash, str) # em (yes, backwards)
+    return str
+
+
+def educateDashesOldSchool(str):
+    """
+    Parameter:  String.
+
+    Returns:    The string, with each instance of "--" translated to
+                an en-dash character, and each "---" translated to
+                an em-dash character.
+    """
+
+    str = re.sub(r"""---""", smart.emdash, str)    # em (yes, backwards)
+    str = re.sub(r"""--""", smart.endash, str)    # en (yes, backwards)
+    return str
+
+
+def educateDashesOldSchoolInverted(str):
+    """
+    Parameter:  String.
+
+    Returns:    The string, with each instance of "--" translated to
+                an em-dash character, and each "---" translated to
+                an en-dash character. Two reasons why: First, unlike the
+                en- and em-dash syntax supported by
+                EducateDashesOldSchool(), it's compatible with existing
+                entries written before SmartyPants 1.1, back when "--" was
+                only used for em-dashes.  Second, em-dashes are more
+                common than en-dashes, and so it sort of makes sense that
+                the shortcut should be shorter to type. (Thanks to Aaron
+                Swartz for the idea.)
+    """
+    str = re.sub(r"""---""", smart.endash, str)    # em
+    str = re.sub(r"""--""", smart.emdash, str)    # en
+    return str
+
+
+
+def educateEllipses(str):
+    """
+    Parameter:  String.
+    Returns:    The string, with each instance of "..." translated to
+                an ellipsis character.
+
+    Example input:  Huh...?
+    Example output: Huh&#8230;?
+    """
+
+    str = re.sub(r"""\.\.\.""", smart.ellipsis, str)
+    str = re.sub(r"""\. \. \.""", smart.ellipsis, str)
+    return str
+
+
+def stupefyEntities(str):
+    """
+    Parameter:  String.
+    Returns:    The string, with each SmartyPants character translated to
+                its ASCII counterpart.
+
+    Example input:  “Hello — world.”
+    Example output: "Hello -- world."
+    """
+
+    str = re.sub(smart.endash, "-", str)  # en-dash
+    str = re.sub(smart.emdash, "--", str) # em-dash
+
+    str = re.sub(smart.lquote, "'", str)  # open single quote
+    str = re.sub(smart.rquote, "'", str)  # close single quote
+
+    str = re.sub(smart.ldquote, '"', str)  # open double quote
+    str = re.sub(smart.rdquote, '"', str)  # close double quote
+
+    str = re.sub(smart.ellipsis, '...', str)# ellipsis
+
+    return str
+
+
+def processEscapes(str):
+    r"""
+    Parameter:  String.
+    Returns:    The string, with after processing the following backslash
+                escape sequences. This is useful if you want to force a "dumb"
+                quote or other character to appear.
+
+                Escape  Value
+                ------  -----
+                \\      &#92;
+                \"      &#34;
+                \'      &#39;
+                \.      &#46;
+                \-      &#45;
+                \`      &#96;
+    """
+    str = re.sub(r"""\\\\""", r"""&#92;""", str)
+    str = re.sub(r'''\\"''', r"""&#34;""", str)
+    str = re.sub(r"""\\'""", r"""&#39;""", str)
+    str = re.sub(r"""\\\.""", r"""&#46;""", str)
+    str = re.sub(r"""\\-""", r"""&#45;""", str)
+    str = re.sub(r"""\\`""", r"""&#96;""", str)
+
+    return str
+
+
+def _tokenize(str):
+    """
+    Parameter:  String containing HTML markup.
+    Returns:    Reference to an array of the tokens comprising the input
+                string. Each token is either a tag (possibly with nested,
+                tags contained therein, such as <a href="<MTFoo>">, or a
+                run of text between tags. Each element of the array is a
+                two-element array; the first is either 'tag' or 'text';
+                the second is the actual value.
+
+    Based on the _tokenize() subroutine from Brad Choate's MTRegex plugin.
+        <http://www.bradchoate.com/past/mtregex.php>
+    """
+
+    pos = 0
+    length = len(str)
+    tokens = []
+
+    depth = 6
+    nested_tags = "|".join(['(?:<(?:[^<>]',] * depth) + (')*>)' * depth)
+    #match = r"""(?: <! ( -- .*? -- \s* )+ > ) |  # comments
+    #               (?: <\? .*? \?> ) |  # directives
+    #               %s  # nested tags       """ % (nested_tags,)
+    tag_soup = re.compile(r"""([^<]*)(<[^>]*>)""")
+
+    token_match = tag_soup.search(str)
+
+    previous_end = 0
+    while token_match is not None:
+        if token_match.group(1):
+            tokens.append(['text', token_match.group(1)])
+
+        tokens.append(['tag', token_match.group(2)])
+
+        previous_end = token_match.end()
+        token_match = tag_soup.search(str, token_match.end())
+
+    if previous_end < len(str):
+        tokens.append(['text', str[previous_end:]])
+
+    return tokens
+
+
+
+if __name__ == "__main__":
+
+    import locale
+
+    try:
+        locale.setlocale(locale.LC_ALL, '')
+    except:
+        pass
+
+    from docutils.core import publish_string
+    docstring_html = publish_string(__doc__, writer_name='html')
+
+    print docstring_html
+
+
+    # Unit test output goes out stderr.  No worries.
+    import unittest
+    sp = smartyPants
+
+    class TestSmartypantsAllAttributes(unittest.TestCase):
+        # the default attribute is "1", which means "all".
+
+        def test_dates(self):
+            self.assertEqual(sp("1440-80's"), u"1440-80’s")
+            self.assertEqual(sp("1440-'80s"), u"1440-‘80s")
+            self.assertEqual(sp("1440---'80s"), u"1440–‘80s")
+            self.assertEqual(sp("1960s"), "1960s")  # no effect.
+            self.assertEqual(sp("1960's"), u"1960’s")
+            self.assertEqual(sp("one two '60s"), u"one two ‘60s")
+            self.assertEqual(sp("'60s"), u"‘60s")
+
+        def test_ordinal_numbers(self):
+            self.assertEqual(sp("21st century"), "21st century")  # no effect.
+            self.assertEqual(sp("3rd"), "3rd")  # no effect.
+
+        def test_educated_quotes(self):
+            self.assertEqual(sp('''"Isn't this fun?"'''), u'“Isn’t this fun?”')
+
+    unittest.main()
+
+
+
+
+__author__ = "Chad Miller <smartypantspy@chad.org>"
+__version__ = "1.5_1.6: Fri, 27 Jul 2007 07:06:40 -0400"
+__url__ = "http://wiki.chad.org/SmartyPantsPy"
+__description__ = "Smart-quotes, smart-ellipses, and smart-dashes for weblog entries in pyblosxom"

docutils/test/test_transforms/test_smartquotes.py

+#!/usr/bin/env python
+# -*- coding: utf8 -*-
+
+# $Id$
+
+# :Copyright: © 2011 Günter Milde.
+# :License: Released under the terms of the `2-Clause BSD license`_, in short:
+#
+#    Copying and distribution of this file, with or without modification,
+#    are permitted in any medium without royalty provided the copyright
+#    notice and this notice are preserved.
+#    This file is offered as-is, without any warranty.
+#
+# .. _2-Clause BSD license: http://www.spdx.org/licenses/BSD-2-Clause
+
+"""
+Test module for universal.SmartQuotes transform.
+"""
+
+
+from __init__ import DocutilsTestSupport # must be imported before docutils
+from docutils.transforms.universal import SmartQuotes
+from docutils.parsers.rst import Parser
+
+def suite():
+    parser = Parser()
+    s = DocutilsTestSupport.TransformTestSuite(
+        parser, suite_settings={'smart_quotes': True})
+    s.generateTests(totest)
+    return s
+
+
+totest = {}
+
+totest['transitions'] = ((SmartQuotes,), [
+["""\
+Test "smart quotes", 'single smart quotes'
+-- and ---also long--- dashes.
+""",
+u"""\
+<document source="test data">
+    <paragraph>
+        Test “smart quotes”, ‘single smart quotes’
+        – and —also long— dashes.
+"""],
+])
+
+
+if __name__ == '__main__':
+    import unittest
+    unittest.main(defaultTest='suite')