Source

smtpErrorAnalysis / doc / _build / html / findBadAddresses.html



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>findBadAddresses Module &mdash; &#39;smtpErrorAnalysis&#39; &#39;0.1.0&#39; documentation</title>
    
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '&#39;0.1.0&#39;',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="&#39;smtpErrorAnalysis&#39; &#39;0.1.0&#39; documentation" href="index.html" />
    <link rel="next" title="regexEmailTester Module" href="regexEmailTester.html" />
    <link rel="prev" title="Welcome to ‘smtp-error-analysis’’s documentation!" href="index.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="regexEmailTester.html" title="regexEmailTester Module"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="index.html" title="Welcome to ‘smtp-error-analysis’’s documentation!"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">&#39;smtpErrorAnalysis&#39; &#39;0.1.0&#39; documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="module-findBadAddresses">
<span id="findbadaddresses-module"></span><h1>findBadAddresses Module<a class="headerlink" href="#module-findBadAddresses" title="Permalink to this headline"></a></h1>
<p>Allows a directory of email messages to be parsed for &#8216;bounce messages&#8217;
and for those &#8216;bounce messages&#8217; to be parsed for details which will 
allow the problems to be analysed.</p>
<p>Particular focus on emails bounced due to sender having used an invalid
address:</p>
<div class="highlight-python"><pre>Usage: findBadAddresses.py [options]

findBadAddresses.py is used to parse a set of files  which represent the
'inbox' of an email account  and consider those email messages which are
'bounceback' emails sent by SMTP servers who have found it impossible to
deliver emails sent by the owner of the 'inbox'.   Command line options
specify the location of the 'inbox'and where output should be written to.

Options:
  -h, --help            show this help message and exit
  -i INBOX, --inbox=INBOX
                        Location of INBOX
  -o PATH, --outpath=PATH
                        PATH to output csv file
  -v, --verbose         Show each file processed</pre>
</div>
<dl class="exception">
<dt id="findBadAddresses.FindBadAddExcptn">
<em class="property">exception </em><tt class="descclassname">findBadAddresses.</tt><tt class="descname">FindBadAddExcptn</tt><big>(</big><em>value</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#FindBadAddExcptn"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.FindBadAddExcptn" title="Permalink to this definition"></a></dt>
<dd><p>Bases: <tt class="xref py py-class docutils literal"><span class="pre">exceptions.Exception</span></tt></p>
<p>Base class for errors in this script.</p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.build_ignore_list">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">build_ignore_list</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#build_ignore_list"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.build_ignore_list" title="Permalink to this definition"></a></dt>
<dd><p>Returns a hard-coded list of file names which will be ignored
in subsequent processing</p>
<p>This is not currently used but is left in place as it supports        
the &#8216;ignore me&#8217; structure which is in place</p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.find_email">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">find_email</tt><big>(</big><em>instr</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#find_email"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.find_email" title="Permalink to this definition"></a></dt>
<dd><p>Given a string searches for all email addresses contained
within the string. We assume:</p>
<ul class="simple">
<li>At least email address will be found</li>
<li>All addresses found will be identical</li>
</ul>
<p>If this is so the email address found will be returned.
If this is not so errors are raised</p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.main">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">main</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#main"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.main" title="Permalink to this definition"></a></dt>
<dd><p>The main() function</p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.parse_args">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">parse_args</tt><big>(</big><big>)</big><a class="headerlink" href="#findBadAddresses.parse_args" title="Permalink to this definition"></a></dt>
<dd><p>Parses command line arguments using OptionParser.
Applies validation rules to arguments and then, if OK
returns them in a &#8216;dictionary like&#8217; object <tt class="docutils literal"><span class="pre">options</span></tt></p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.parse_email_for_del_stat_part">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">parse_email_for_del_stat_part</tt><big>(</big><em>file_name</em>, <em>path_em_file</em>, <em>csv_dict_wrtr</em>, <em>options</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#parse_email_for_del_stat_part"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.parse_email_for_del_stat_part" title="Permalink to this definition"></a></dt>
<dd><p>Given the text of a SMTP &#8216;bounce message&#8217; writes a CSV row 
to match the headers in the global variable HDR_OUTPUT_COLS.</p>
<p>It does this by finding the &#8216;message/delivery-status&#8217; part of 
the entire email and parsing the headers.</p>
<p>An &#8216;message/delivery-status&#8217; part of a &#8216;bounce email&#8217; looks a 
little like this</p>
<div class="highlight-python"><pre>Content-Description: Delivery report
Content-Type: message/delivery-status

Reporting-MTA: dns; a.b.web              
X-Postfix-Queue-ID: 808F17F8080
X-Postfix-Sender: rfc822; someone@c.d.web
Arrival-Date: Tue,  8 May 2012 16:30:12 -0700 (PDT)

Final-Recipient: rfc822; john.smith@e.web
Original-Recipient: rfc822;john.smith@e.web
Action: failed
Status: 5.0.0
Remote-MTA: dns; smtp.e.web
Diagnostic-Code: smtp; 550 &lt;john.smith@e.web&gt;, Recipient unknown</pre>
</div>
<p>NB: All sorts of assumptions are made about the structure of the 
bounce message which seem to hold true for a large sample I have 
used in testing but it seems likely that somewhere there are &#8216;bounce
messages&#8217; which follow different conventions. In particular I suspect
that were the original email message to be something other than a two
part multipart email message there might be problems</p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.remove_rfc_notation">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">remove_rfc_notation</tt><big>(</big><em>email_to_be_cleaned</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#remove_rfc_notation"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.remove_rfc_notation" title="Permalink to this definition"></a></dt>
<dd><p>Given a string which contains an email address in oe of the two 
following formats</p>
<blockquote>
<div><ul class="simple">
<li><tt class="docutils literal"><span class="pre">a&#64;foo.bar</span></tt></li>
<li><tt class="docutils literal"><span class="pre">rfc:a&#64;foo.bar</span></tt></li>
</ul>
</div></blockquote>
<p>this function will return <tt class="docutils literal"><span class="pre">a&#64;foo.bar</span></tt></p>
</dd></dl>

<dl class="function">
<dt id="findBadAddresses.strip_line_feeds">
<tt class="descclassname">findBadAddresses.</tt><tt class="descname">strip_line_feeds</tt><big>(</big><em>string</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#strip_line_feeds"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.strip_line_feeds" title="Permalink to this definition"></a></dt>
<dd><p>Return the input string with CRLF
characters removed</p>
</dd></dl>

</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h4>Previous topic</h4>
  <p class="topless"><a href="index.html"
                        title="previous chapter">Welcome to &#8216;smtp-error-analysis&#8217;&#8217;s documentation!</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="regexEmailTester.html"
                        title="next chapter">regexEmailTester Module</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/findBadAddresses.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="regexEmailTester.html" title="regexEmailTester Module"
             >next</a> |</li>
        <li class="right" >
          <a href="index.html" title="Welcome to ‘smtp-error-analysis’’s documentation!"
             >previous</a> |</li>
        <li><a href="index.html">&#39;smtpErrorAnalysis&#39; &#39;0.1.0&#39; documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2012, Richard Shea.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1.3.
    </div>
  </body>
</html>