Richard Shea avatar Richard Shea committed a030f1e

#3 resolution

Comments (0)

Files changed (8)

Add a comment to this file

dist/smtpErrorAnalysis-0.1.zip

Binary file modified.

Add a comment to this file

doc/_build/doctrees/environment.pickle

Binary file modified.

Add a comment to this file

doc/_build/doctrees/findBadAddresses.doctree

Binary file modified.

doc/_build/html/findBadAddresses.html

 and for those ‘bounce messages’ to be parsed for details which will 
 allow the problems to be analysed.</p>
 <p>Particular focus on emails bounced due to sender having used an invalid
-address</p>
+address:</p>
+<div class="highlight-python"><pre>Usage: findBadAddresses.py [options]
+
+findBadAddresses.py is used to parse a set of files  which represent the
+'inbox' of an email account  and consider those email messages which are
+'bounceback' emails sent by SMTP servers who have found it impossible to
+deliver emails sent by the owner of the 'inbox'.   Command line options
+specify the location of the 'inbox'and where output should be written to.
+
+Options:
+  -h, --help            show this help message and exit
+  -i INBOX, --inbox=INBOX
+                        Location of INBOX
+  -o PATH, --outpath=PATH
+                        PATH to output csv file
+  -v, --verbose         Show each file processed</pre>
+</div>
 <dl class="exception">
 <dt id="findBadAddresses.FindBadAddExcptn">
 <em class="property">exception </em><tt class="descclassname">findBadAddresses.</tt><tt class="descname">FindBadAddExcptn</tt><big>(</big><em>value</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#FindBadAddExcptn"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.FindBadAddExcptn" title="Permalink to this definition">¶</a></dt>
 <dt id="findBadAddresses.main">
 <tt class="descclassname">findBadAddresses.</tt><tt class="descname">main</tt><big>(</big><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#main"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.main" title="Permalink to this definition">¶</a></dt>
 <dd><p>The main() function</p>
-<p>Needs work in order that the location of email files to be parsed
-and the location of output files may be specificed via command
-line params</p>
+</dd></dl>
+
+<dl class="function">
+<dt id="findBadAddresses.parse_args">
+<tt class="descclassname">findBadAddresses.</tt><tt class="descname">parse_args</tt><big>(</big><big>)</big><a class="headerlink" href="#findBadAddresses.parse_args" title="Permalink to this definition">¶</a></dt>
+<dd><p>Parses command line arguments using OptionParser.
+Applies validation rules to arguments and then, if OK
+returns them in a &#8216;dictionary like&#8217; object <tt class="docutils literal"><span class="pre">options</span></tt></p>
 </dd></dl>
 
 <dl class="function">
 <dt id="findBadAddresses.parse_email_for_del_stat_part">
-<tt class="descclassname">findBadAddresses.</tt><tt class="descname">parse_email_for_del_stat_part</tt><big>(</big><em>file_name</em>, <em>path_em_file</em>, <em>csv_dict_wrtr</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#parse_email_for_del_stat_part"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.parse_email_for_del_stat_part" title="Permalink to this definition">¶</a></dt>
+<tt class="descclassname">findBadAddresses.</tt><tt class="descname">parse_email_for_del_stat_part</tt><big>(</big><em>file_name</em>, <em>path_em_file</em>, <em>csv_dict_wrtr</em>, <em>options</em><big>)</big><a class="reference internal" href="_modules/findBadAddresses.html#parse_email_for_del_stat_part"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#findBadAddresses.parse_email_for_del_stat_part" title="Permalink to this definition">¶</a></dt>
 <dd><p>Given the text of a SMTP &#8216;bounce message&#8217; writes a CSV row 
 to match the headers in the global variable HDR_OUTPUT_COLS.</p>
 <p>It does this by finding the &#8216;message/delivery-status&#8217; part of 

doc/_build/html/genindex.html

 <table style="width: 100%" class="indextable genindextable"><tr>
   <td style="width: 33%" valign="top"><dl>
       
+  <dt><a href="findBadAddresses.html#findBadAddresses.parse_args">parse_args() (in module findBadAddresses)</a>
+  </dt>
+
+  </dl></td>
+  <td style="width: 33%" valign="top"><dl>
+      
   <dt><a href="findBadAddresses.html#findBadAddresses.parse_email_for_del_stat_part">parse_email_for_del_stat_part() (in module findBadAddresses)</a>
   </dt>
 
Add a comment to this file

doc/_build/html/objects.inv

File contents unchanged.

doc/_build/html/searchindex.js

-Search.setIndex({objects:{"":{findBadAddresses:[1,0,1,""],regexEmailTester:[2,0,1,""]},findBadAddresses:{strip_line_feeds:[1,1,1,""],FindBadAddExcptn:[1,2,1,""],build_ignore_list:[1,1,1,""],remove_rfc_notation:[1,1,1,""],find_email:[1,1,1,""],parse_email_for_del_stat_part:[1,1,1,""],main:[1,1,1,""]},regexEmailTester:{main:[2,1,1,""]}},terms:{recipi:1,all:1,code:1,text:1,global:1,find_email:1,follow:1,find:1,row:1,web:1,locat:1,"808f17f8080":1,smith:1,except:1,param:1,suspect:1,subsequ:1,analys:1,bounc:1,match:1,sourc:[1,2],"return":1,string:1,format:1,ident:1,report:1,bar:1,path_em_fil:1,findbadaddexcptn:1,somewher:1,like:1,specif:1,list:1,chuck:2,rfc:1,regexemailtest:[0,2],remove_rfc_not:1,contain:1,output:1,page:0,hard:1,sampl:1,multipart:1,fail:1,variabl:1,index:0,statu:1,someth:1,content:[0,1],larg:1,parse_email_for_del_stat_part:1,current:1,foo:1,email:1,file_nam:1,assumpt:1,were:1,given:1,base:1,crlf:1,found:1,tue:1,valu:1,search:[0,1],sender:1,pdt:1,queue:1,entir:1,place:1,action:1,onto:2,origin:1,via:1,instr:1,modul:[0,1,2],within:1,arriv:1,two:1,header:1,messag:1,assum:1,differ:1,convent:1,script:1,unknown:1,support:1,findbadaddress:[0,1],due:1,least:1,name:1,john:1,type:1,"final":1,"function":[1,2],main:[1,2],strip_line_fe:1,about:1,sort:1,part:1,pars:1,particular:1,line:1,hold:1,"true":1,than:1,those:1,made:1,input:1,csv:1,remot:1,might:1,remov:1,work:1,focu:1,structur:1,charact:1,postfix:1,"while":2,error:[0,1],problem:1,mta:1,document:0,address:1,deliveri:1,have:1,csv_dict_wrtr:1,look:1,process:1,smtp:[0,1],build_ignore_list:1,indic:0,diagnost:1,file:[1,2],tabl:0,need:[1,2],seem:1,date:1,welcom:0,want:2,detail:1,invalid:1,write:1,other:1,which:[1,2],test:1,ignor:1,eventu:2,email_to_be_clean:1,analysi:0,hdr_output_col:1,allow:1,rfc822:1,hang:2,someon:1,rais:1,mai:1,littl:1,"class":1,directori:1,descript:1,doe:1,command:1,thi:[1,2],order:1,left:1},objtypes:{"0":"py:module","1":"py:function","2":"py:exception"},titles:["Welcome to &#8216;smtp-error-analysis&#8217;&#8217;s documentation!","findBadAddresses Module","regexEmailTester Module"],objnames:{"0":["py","module","Python module"],"1":["py","function","Python function"],"2":["py","exception","Python exception"]},filenames:["index","findBadAddresses","regexEmailTester"]})
+Search.setIndex({objects:{"":{findBadAddresses:[1,0,1,""],regexEmailTester:[2,0,1,""]},findBadAddresses:{strip_line_feeds:[1,1,1,""],FindBadAddExcptn:[1,2,1,""],parse_args:[1,1,1,""],build_ignore_list:[1,1,1,""],remove_rfc_notation:[1,1,1,""],find_email:[1,1,1,""],parse_email_for_del_stat_part:[1,1,1,""],main:[1,1,1,""]},regexEmailTester:{main:[2,1,1,""]}},terms:{recipi:1,all:1,code:1,help:1,show:1,text:1,global:1,find_email:1,follow:1,find:1,row:1,web:1,locat:1,"808f17f8080":1,smith:1,except:1,param:[],should:1,other:1,suspect:1,subsequ:1,analys:1,bounc:1,match:1,them:1,sourc:[1,2],"return":1,string:1,format:1,bounceback:1,ident:1,report:1,bar:1,path_em_fil:1,findbadaddexcptn:1,somewher:1,like:1,specif:[],list:1,chuck:2,server:1,rfc:1,regexemailtest:[0,2],remove_rfc_not:1,contain:1,output:1,where:1,page:0,set:1,outpath:1,hard:1,sampl:1,multipart:1,fail:1,variabl:1,index:0,statu:1,someth:1,content:[0,1],written:1,larg:1,parse_email_for_del_stat_part:1,current:1,foo:1,email:1,file_nam:1,assumpt:1,who:1,each:1,usag:1,given:1,were:1,base:1,crlf:1,dictionari:1,found:1,path:1,tue:1,valu:1,search:[0,1],sender:1,pdt:1,queue:1,entir:1,place:1,action:1,onto:2,imposs:1,origin:1,via:[],instr:1,appli:1,modul:[0,1,2],within:1,arriv:1,two:1,header:1,inbox:1,owner:1,assum:1,differ:1,convent:1,script:1,unknown:1,support:1,findbadaddress:[0,1],due:1,messag:1,name:1,verbos:1,john:1,type:1,"final":1,deliveri:1,main:[1,2],option:1,tupl:[],optionpars:1,strip_line_fe:1,about:1,specifi:1,argument:1,sort:1,part:1,pars:1,particular:1,line:1,hold:1,"true":1,than:1,those:1,account:1,made:1,input:1,csv:1,remot:1,might:1,remov:1,work:[],focu:1,structur:1,charact:1,postfix:1,"while":2,respres:[],sent:1,error:[0,1],problem:1,mta:1,document:0,address:1,"function":[1,2],have:1,csv_dict_wrtr:1,look:1,process:1,deliv:1,smtp:[0,1],build_ignore_list:1,indic:0,repres:1,diagnost:1,exit:1,file:[1,2],tabl:0,need:2,seem:1,date:1,welcom:0,want:2,detail:1,invalid:1,write:1,valid:1,which:[1,2],test:1,ignor:1,eventu:2,email_to_be_clean:1,analysi:0,hdr_output_col:1,allow:1,rfc822:1,object:1,hang:2,someon:1,rais:1,consid:1,mai:1,littl:1,"class":1,least:1,parse_arg:1,directori:1,descript:1,rule:1,doe:1,command:1,thi:[1,2],order:[],left:1},objtypes:{"0":"py:module","1":"py:function","2":"py:exception"},titles:["Welcome to &#8216;smtp-error-analysis&#8217;&#8217;s documentation!","findBadAddresses Module","regexEmailTester Module"],objnames:{"0":["py","module","Python module"],"1":["py","function","Python function"],"2":["py","exception","Python exception"]},filenames:["index","findBadAddresses","regexEmailTester"]})

smtpErrorAnalysis/findBadAddresses.py

 allow the problems to be analysed.
 
 Particular focus on emails bounced due to sender having used an invalid
-address
+address::
+
+    Usage: findBadAddresses.py [options]
+
+    findBadAddresses.py is used to parse a set of files  which represent the
+    'inbox' of an email account  and consider those email messages which are
+    'bounceback' emails sent by SMTP servers who have found it impossible to
+    deliver emails sent by the owner of the 'inbox'.   Command line options
+    specify the location of the 'inbox'and where output should be written to.
+
+    Options:
+      -h, --help            show this help message and exit
+      -i INBOX, --inbox=INBOX
+                            Location of INBOX
+      -o PATH, --outpath=PATH
+                            PATH to output csv file
+      -v, --verbose         Show each file processed
 
 '''
 import os
 import csv
 import re
 import pprint
+from optparse import OptionParser
 ERR1 = "Found zero email addresses so don't know what to do" 
 ERR2 = "Found more than one email address so don't know what to do [%s]"
 HDR_OUTPUT_COLS = [ 'HUM-READ-EMAIL-ADDR',
     else:
         return l_em_to_be_clnd[1]
 
-def parse_email_for_del_stat_part(file_name, path_em_file, csv_dict_wrtr):
+def parse_email_for_del_stat_part(file_name, path_em_file, 
+                                    csv_dict_wrtr, options):
     '''
     Given the text of a SMTP 'bounce message' writes a CSV row 
     to match the headers in the global variable HDR_OUTPUT_COLS.
     part multipart email message there might be problems
 
     '''
-    print "About to process : %s" % file_name
+    if options.verbose:
+        print "About to process : %s" % file_name
     em_file = file(path_em_file)
     em_msg = email.message_from_string(em_file.read())
     try:
     lst = []
     return lst
 
+def parse_args():
+    '''
+    Parses command line arguments using OptionParser.
+    Applies validation rules to arguments and then, if OK
+    returns them in a 'dictionary like' object ``options``
+
+    '''
+    desc = "%prog is used to parse a set of files \n" + \
+    "which represent the 'inbox' of an email account \n" + \
+    "and consider those email messages which are 'bounceback'\n" + \
+    "emails sent by SMTP servers who have found it impossible\n" + \
+    "to deliver emails sent by the owner of the 'inbox'.\n" + \
+    "\n\n" + \
+    "Command line options specify the location of the 'inbox'" + \
+    "and where output should be written to." 
+
+    usage_inner = "Usage: %s [options]"
+    usage = usage_inner % "%prog"
+
+
+    parser = OptionParser(description=desc, usage=usage)
+    parser.add_option(  "-i", "--inbox", action="store",  dest="inbox", 
+                        metavar="INBOX", help="Location of INBOX")
+    parser.add_option(  "-o", "--outpath", action="store", dest="outpath",
+                        metavar="PATH", help="PATH to output csv file")
+    parser.add_option(  "-v", "--verbose", action="store_true", 
+                        dest="verbose", help="Show each file processed")
+
+    (options, args) = parser.parse_args()
+
+    if (options.inbox is None) and (options.outpath is None):   
+        parser.print_help()
+        exit(-1)
+    elif not os.path.exists(options.inbox):
+        parser.error('inbox location does not exist')
+    elif not os.path.exists(os.path.dirname(options.outpath)):
+        parser.error('path to ouput location does not exist')
+
+    return options
+
 def main():
     '''
     The main() function
 
-    Needs work in order that the location of email files to be parsed
-    and the location of output files may be specificed via command
-    line params
     '''
-    lst_files_to_ignore = build_ignore_list() 
-    path = 'C:/usr/rshea/mytemp/20110609/NZLPProblemEmails-20120510/'
-    listing = os.listdir(path)
+
+    options = parse_args()
 
     #Create a csv.DictWriter to write output to
     csv_dict_wrtr = csv.DictWriter( \
-            open('NZLP-bademailaddresses-headers-20120510.csv', 'wb'), \
+            open(options.outpath, 'wb'), \
             HDR_OUTPUT_COLS, \
             restval='N/A', \
             dialect='excel')
     #Write the initial headers
     csv_dict_wrtr.writerow(dict(zip(HDR_OUTPUT_COLS, HDR_OUTPUT_COLS)))
 
+    lst_files_to_ignore = build_ignore_list() 
+
+    listing = os.listdir(options.inbox)
+
     #Process each file in turn
     for in_file_name in listing:
         if in_file_name in lst_files_to_ignore:
             pass
         else:
-            in_file_path = "%s/%s" % (path, in_file_name)
+            in_file_path = "%s/%s" % (options.inbox, in_file_name)
             parse_email_for_del_stat_part(  in_file_name, 
                                             in_file_path, 
-                                            csv_dict_wrtr)
+                                            csv_dict_wrtr,
+                                            options)
 
 if __name__ == "__main__":
     main()
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.