Source

simplere / docs / _build / html / index.html



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>simplere &mdash; simplere 1.0.4 documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '1.0.4',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="simplere 1.0.4 documentation" href="#" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li><a href="#">simplere 1.0.4 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="simplere">
<h1>simplere<a class="headerlink" href="#simplere" title="Permalink to this headline"></a></h1>
<p>A simplified interface to Python&#8217;s regular expression (<tt class="docutils literal"><span class="pre">re</span></tt>)
string search that tries to eliminate steps and provide
simpler access to results. As a bonus, also provides compatible way to
access Unix glob searches.</p>
</div>
<div class="section" id="usage">
<h1>Usage<a class="headerlink" href="#usage" title="Permalink to this headline"></a></h1>
<p>Python regular expressions are powerful, but the language&#8217;s lack
of an <em>en passant</em> (in passing) assignment requires a preparatory
motion and then a test:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">re</span>

<span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="n">some_string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
    <span class="k">print</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
<p>With <tt class="docutils literal"><span class="pre">simplere</span></tt>, you can do it in fewer steps:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">simplere</span> <span class="kn">import</span> <span class="o">*</span>

<span class="k">if</span> <span class="n">match</span> <span class="o">/</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="n">some_string</span><span class="p">):</span>
    <span class="k">print</span> <span class="n">match</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
</div>
<div class="section" id="motivation">
<h1>Motivation<a class="headerlink" href="#motivation" title="Permalink to this headline"></a></h1>
<p>In the simple examples above, &#8220;fewer steps&#8221; seems like a small
savings (3 lines to 2). While a 33% savings is a pretty good
optimization, is it really worth using another module and
a quirky <em>en passant</em> operator to get it?</p>
<p>In code this simple, maybe not. But real regex-based searching tends
to have multiple, cascading searches, and to be tightly interwoven
with complex pre-conditions, error-checking, and post-match formatting
or actions. It gets complicated fast. When multiple <tt class="docutils literal"><span class="pre">re</span></tt> matches
must be done, it consumes a lot of &#8220;vertical space&#8221; and often
threatens to push the number of lines a programmer is viewing at
any given moment beyond the number that can be easily held in working
memory. In that case, it proves valuable to condense what is logically
a single operation (&#8220;regular expression test&#8221;) into a single line
with its conditional <tt class="docutils literal"><span class="pre">if</span></tt>.</p>
<p>This is even more true for the &#8220;exploratory&#8221; phases of development,
before a program&#8217;s appropriate structure and best logical boundaries
have been established.  One can always &#8220;back out&#8221; the condensing <em>en
passant</em> operation in later production code, if desired.</p>
<div class="toctree-wrapper compound">
<ul class="simple">
</ul>
</div>
</div>
<div class="section" id="re-objects">
<h1>Re Objects<a class="headerlink" href="#re-objects" title="Permalink to this headline"></a></h1>
<p><tt class="docutils literal"><span class="pre">Re</span></tt> objects are <a class="reference external" href="http://en.wikipedia.org/wiki/Memoization">memoized</a> for efficiency, so they compile their
pattern just once, regardless of how many times they&#8217;re mentioned in a
program.</p>
<p>Note that the <tt class="docutils literal"><span class="pre">in</span></tt> test turns the sense of the matching around (compared to
the standard <tt class="docutils literal"><span class="pre">re</span></tt> module). It asks &#8220;is the given string <em>in</em>
the set of items this pattern describes?&#8221; To be fancy, the
<tt class="docutils literal"><span class="pre">Re</span></tt> pattern is an intensionally
defined set (namely &#8220;all strings matching the pattern&#8221;). This order often makes
excellent sense whey you have a clear intent for the test. For example, &#8220;is the
given string within the set of <em>all legitimate commands</em>?&#8221;</p>
<p>Second, the <tt class="docutils literal"><span class="pre">in</span></tt> test had the side effect of setting the underscore
name <tt class="docutils literal"><span class="pre">_</span></tt> to the result. Python doesn&#8217;t support <em>en passant</em> assignment&#8211;apparently,
no matter how hard you try, or how much introspection you use. This makes it
harder to both test and collect results in the same motion, even though that&#8217;s
often exactly appropriate. Collecting them in a class variable is a fallback
strategy (see the <em>En Passant</em> section below for a slicker one).</p>
<p>If you prefer the more traditional <tt class="docutils literal"><span class="pre">re</span></tt> calls:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">if</span> <span class="n">Re</span><span class="p">(</span><span class="n">pattern</span><span class="p">)</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">some_string</span><span class="p">):</span>
    <span class="k">print</span> <span class="n">Re</span><span class="o">.</span><span class="n">_</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
<p><tt class="docutils literal"><span class="pre">Re</span></tt> works even better with named pattern components, which are exposed
as attributes of the returned object:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">person</span> <span class="o">=</span> <span class="s">&#39;John Smith 48&#39;</span>
<span class="k">if</span> <span class="n">person</span> <span class="ow">in</span> <span class="n">Re</span><span class="p">(</span><span class="s">r&#39;(?P&lt;name&gt;[\w\s]*)\s+(?P&lt;age&gt;\d+)&#39;</span><span class="p">):</span>
    <span class="k">print</span> <span class="n">Re</span><span class="o">.</span><span class="n">_</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="s">&quot;is&quot;</span><span class="p">,</span> <span class="n">Re</span><span class="o">.</span><span class="n">_</span><span class="o">.</span><span class="n">age</span><span class="p">,</span> <span class="s">&quot;years old&quot;</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span> <span class="s">&quot;don&#39;t understand &#39;{}&#39;&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">person</span><span class="p">)</span>
</pre></div>
</div>
<p>One trick being used here is that the returned object is not a pure
<tt class="docutils literal"><span class="pre">_sre.SRE_Match</span></tt> that Python&#8217;s <tt class="docutils literal"><span class="pre">re</span></tt> module returns. Nor is it a subclass.
(That class <a class="reference external" href="http://stackoverflow.com/questions/4835352/subclassing-matchobject-in-python">appears to be unsubclassable</a>.)
Thus, regular expression matches return a proxy object that
exposes the match object&#8217;s numeric (positional) and
named groups through indices and attributes. If a named group has the same
name as a match object method or property, it takes precedence. Either
change the name of the match group or access the underlying property thus:
<tt class="docutils literal"><span class="pre">x._match.property</span></tt></p>
<p>It&#8217;s possible also to loop over the results:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">for</span> <span class="n">found</span> <span class="ow">in</span> <span class="n">Re</span><span class="p">(</span><span class="s">&#39;pattern (\w+)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="s">&#39;pattern is as pattern does&#39;</span><span class="p">):</span>
    <span class="k">print</span> <span class="n">found</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
<p>Or collect them all in one fell swoop:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">found</span> <span class="o">=</span> <span class="n">Re</span><span class="p">(</span><span class="s">&#39;pattern (\w+)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">&#39;pattern is as pattern does&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>Pretty much all of the methods and properties one can access from the standard
<tt class="docutils literal"><span class="pre">re</span></tt> module are available.</p>
</div>
<div class="section" id="bonus-globs">
<h1>Bonus: Globs<a class="headerlink" href="#bonus-globs" title="Permalink to this headline"></a></h1>
<p>Regular expressions are wonderfully powerful, but sometimes the simpler <a class="reference external" href="http://en.wikipedia.org/wiki/Glob_(programming)">Unix glob</a> is works just fine. As a bonus,
<tt class="docutils literal"><span class="pre">simplere</span></tt> also provides simple glob access.:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">if</span> <span class="s">&#39;globtastic&#39;</span> <span class="ow">in</span> <span class="n">Glob</span><span class="p">(</span><span class="s">&#39;glob*&#39;</span><span class="p">):</span>
    <span class="k">print</span> <span class="s">&quot;Yes! It is!&quot;</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s">&#39;YES IT IS&#39;</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="en-passant-under-the-covers">
<h1>En Passant, Under the Covers<a class="headerlink" href="#en-passant-under-the-covers" title="Permalink to this headline"></a></h1>
<p><tt class="docutils literal"><span class="pre">ReMatch</span></tt> objects
wrap Python&#8217;s native``_sre.SRE_Match`` objects (the things that <tt class="docutils literal"><span class="pre">re</span></tt>
method calls return).:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s">r&#39;(?P&lt;word&gt;th.s)&#39;</span><span class="p">,</span> <span class="s">&#39;this is a string&#39;</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">ReMatch</span><span class="p">(</span><span class="n">match</span><span class="p">)</span>
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
    <span class="k">print</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>    <span class="c"># still works</span>
    <span class="k">print</span> <span class="n">match</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>          <span class="c"># same thing</span>
    <span class="k">print</span> <span class="n">match</span><span class="o">.</span><span class="n">word</span>        <span class="c"># same thing, with logical name</span>
</pre></div>
</div>
<p>But that&#8217;s a huge amount of boiler plate for a simple test, right? So <tt class="docutils literal"><span class="pre">simplere</span></tt>
<em>en passant</em> operator redefining the division operation and proxies the <tt class="docutils literal"><span class="pre">re</span></tt> result
on the fly to the pre-defined <tt class="docutils literal"><span class="pre">match</span></tt> object:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">if</span> <span class="n">match</span> <span class="o">/</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s">r&#39;(?P&lt;word&gt;th.s)&#39;</span><span class="p">,</span> <span class="s">&#39;this is a string&#39;</span><span class="p">):</span>
    <span class="k">assert</span> <span class="n">match</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="s">&#39;this&#39;</span>
    <span class="k">assert</span> <span class="n">match</span><span class="o">.</span><span class="n">word</span> <span class="o">==</span> <span class="s">&#39;this&#39;</span>
    <span class="k">assert</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="s">&#39;this&#39;</span>
</pre></div>
</div>
<p>If the <tt class="docutils literal"><span class="pre">re</span></tt> operation fails, the resulting object is guaranteed to have
a <tt class="docutils literal"><span class="pre">False</span></tt>-like Boolean value, so that it will fall through conditional tests.</p>
<p>If you prefer the look of the less-than (<tt class="docutils literal"><span class="pre">&lt;</span></tt>) or less-than-or-equal (<tt class="docutils literal"><span class="pre">&lt;=</span></tt>),
as indicators that <tt class="docutils literal"><span class="pre">match</span></tt> takes the value of the following function call, they
are experimentally supported as aliases of the division operation (<tt class="docutils literal"><span class="pre">/</span></tt>).
You may define your
own match objects, and can use them on memoized <tt class="docutils literal"><span class="pre">Re</span></tt> objects too. Putting
a few of these optional things together:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">answer</span> <span class="o">=</span> <span class="n">Match</span><span class="p">()</span>   <span class="c"># need to do this just once</span>

<span class="k">if</span> <span class="n">answer</span> <span class="o">&lt;</span> <span class="n">Re</span><span class="p">(</span><span class="s">r&#39;(?P&lt;word&gt;th..)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s">&#39;and that goes there&#39;</span><span class="p">):</span>
    <span class="k">assert</span> <span class="n">answer</span><span class="o">.</span><span class="n">word</span> <span class="o">==</span> <span class="s">&#39;that&#39;</span>
</pre></div>
</div>
</div>
<div class="section" id="notes">
<h1>Notes<a class="headerlink" href="#notes" title="Permalink to this headline"></a></h1>
<blockquote>
<div><ul class="simple">
<li>Automated multi-version testing is managed with the wonderful
<a class="reference external" href="http://pypi.python.org/pypi/pytest">pytest</a>
and <a class="reference external" href="http://pypi.python.org/pypi/tox">tox</a>. <tt class="docutils literal"><span class="pre">simplere</span></tt> is
successfully packaged for, and tested against, all late-model versions of
Python: 2.6, 2.7, 3.2, and 3.3, as well as PyPy 2.1 (based on 2.7.3).
Travis-CI testing has also commenced.</li>
<li><tt class="docutils literal"><span class="pre">simplere</span></tt> is one part of a larger effort to add intensional sets
to Python. The <a class="reference external" href="http://pypi.python.org/pypi/intensional">intensional</a>
package contains a parallel implementation of <tt class="docutils literal"><span class="pre">Re</span></tt>, among many other
things.</li>
<li>The author, <a class="reference external" href="mailto:jonathan&#46;eunice&#37;&#52;&#48;gmail&#46;com">Jonathan Eunice</a> or
<a class="reference external" href="http://twitter.com/jeunice">&#64;jeunice on Twitter</a>
welcomes your comments and suggestions.</li>
</ul>
</div></blockquote>
</div>
<div class="section" id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline"></a></h1>
<p>To install the latest version:</p>
<div class="highlight-python"><pre>pip install -U simplere</pre>
</div>
<p>To <tt class="docutils literal"><span class="pre">easy_install</span></tt> under a specific Python version (3.3 in this example):</p>
<div class="highlight-python"><pre>python3.3 -m easy_install --upgrade simplere</pre>
</div>
<p>(You may need to prefix these with &#8220;sudo &#8221; to authorize installation.)</p>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="#">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">simplere</a></li>
<li><a class="reference internal" href="#usage">Usage</a></li>
<li><a class="reference internal" href="#motivation">Motivation</a><ul>
</ul>
</li>
<li><a class="reference internal" href="#re-objects">Re Objects</a></li>
<li><a class="reference internal" href="#bonus-globs">Bonus: Globs</a></li>
<li><a class="reference internal" href="#en-passant-under-the-covers">En Passant, Under the Covers</a></li>
<li><a class="reference internal" href="#notes">Notes</a></li>
<li><a class="reference internal" href="#installation">Installation</a></li>
</ul>

  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/index.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li><a href="#">simplere 1.0.4 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2013, Jonathan Eunice.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
    </div>
  </body>
</html>