Source

getlessmail /

Filename Size Date modified Message
4.7 MB
784 B
268 B
11.1 KB
420 B
1.3 KB
1.9 KB
105 B
1020 B
340 B
1.1 KB
getlessmail.rb	Mail filter for use with getmail

Author: Chip Camden
Date:	April, 2010
Site:	http://chipstips.com/?tag=rbgetlessmail

Purpose:

Why write yet another email filter?  Because none of the ones I know
about give me full control over how rules are applied.  Getlessmail
puts the power of Ruby into your hands along with a simple EDSL for
manipulating mail messages.  Thus, you can define the order in which
rules are applied, modify messages or their headers, move or copy
messages to other mailboxes automatically, or anything else you can do
with Ruby.

Files included:

dot.getlessmail	Example user-specific ~/.getlessmail
filter_mbox.rb	Filter an entire mbox file using getlessmail.rb
getlessmail.rb	Ruby script to execute as a getmail external filter.
getmailrc	Example snippet of user-specific ~/.getmail/getmailrc
glmadd.rb	Command-line tool for adding .getlessmail entries
glmpipe.rb	Pipe filter for extracting sender and adding to .getlessmail
IPGeo.rb	Extract location for IP address
IPGeo.csv	Example IP Location database
IPGeoMail.rb	Extract location for IP address in Received: header fields

Getlessmail provides the ability to script email filtering in Ruby.  After
receiving an email to filter, getlessmail executes the file ~/.getlessmail
within its own lexical environment, using eval.  Thus, the following global
variables are available to your script:

$body		The complete body of the message, with attachments
$header		Contains all header fields, with newlines and whitespace
$maildir	Path to the mail directory, assuming mbox format

You can modify any of the above directly.  Modifying $body or $header
will change the content of the resulting message.  Changing $maildir will
affect any later use of the copy, move, or spam functions below.

Additionally, the following functions are available:

addfield(name, value) => string
	Adds a header field named name to the end of the header,
	with a value of value.  No wrapping occurs, so you must embed
	newline and whitespace yourself if required.  Returns the entire
	header.

block
	Exits the process immediately, disallowing the message.

body(pattern) => matchdata or nil or string
	Returns non-nil if the body contains pattern.  If pattern is
	nil, returns $body.

copy(mbox) => true
	Copies the message to the file named mbox in the directory
	specified by $maildir.  This assumes storage in mbox format.
	The file will be created if necessary, otherwise a boundary will
	be appended prior to the message.  This allows readers like mutt
	to see the file as a mailbox.

delfield(name) => string
	Deletes all header fields named name from the header, returning
	the entire new header.

field(name,pattern) => matchdata or nil or string
	Returns non-nil if the header field named name contains pattern.
	If pattern is nil, returns the text of the specified field.

getall(name) => array
	Returns the values of all header fields named name.

getfield(name) => string
	Returns the value associated with the first occurrence of the
	header field named name.  Newlines and whitespace are preserved,
	except for any initial whitespace following the colon after the
	field name.

keep
	Exits the process immediately, allowing the message to pass.

message => string
	Returns the entire message with header.  Modifying this result
	has no effect.

move(mbox)
	Copies the message to mbox (see copy), then exits immediately,
	blocking the message.

pipe(command) => true
	Pipes the message to command.

spam
	Moves the message to a file named "spam" (see move).  Exits
	immediately.

Any unrecognized verb is assumed to be the name of a header field.  If
called without arguments, the text of that header field is returned.  If
an argument is passed, it is treated as a pattern, and the method returns
a non-nil MatchData if the pattern matches, otherwise nil.  This allows
you do use statements such as:

spam if from '@example.com'

Because 'from' is treated as a method that matches the "From" header field.

Where pattern is used as a parameter above, it is assumed to be one of two
things:

1.  A String containing a regular expression (without /'s) that will be treated as
case-insensitive.  You can, of course, dynamically alter case-sensitivity
by embedding (?-i) or (?-i:) in your pattern.  The Regexp::MULTILINE option
is NOT enabled, so if you wish for . to match newline, you must include the
m option dynamically with (?m) or (?m:).  All other options are also off
by default.

2.  A Regexp, in which case you can specify whatever options and patterns you want.

Note that therefore you could also do:

spam if from == 'spammer@example.com'

which would compare the entire "From:" field value as a string, rather than
as a case-insensitive regular expression.

As indicated above, some functions exit immediately using exit.  That's
how a getmail filter indicates whether to keep or remove the message.
Those functions are: keep, block, move, and spam.

Thus, you can write a series of rules for getlessmail, and the first one
that qualifies will be applied, ignoring the rest.  See, for instance,
the example dot.getlessmail provided:

	addfield "X-GetLessMail", "inspected by getlessmail"
	keep if from "@(trusted|commercial|domains).com"
	keep if from "goodfriend@example.com"
	spam if from "@example.com|viagra|pfizer"
	spam if subject "viagra|cialis"

In all cases, a header field named "X-GetLessMail" will be added to
the message, with the content "inspected by getlessmail."  Then, if
the message is from anyone at trusted.com, commercial.com, or domains.com,
we'll keep it and exit.  If it's from goodfriend@example.com, we'll
keep it and exit.  Otherwise, if it's from anyone else at example.com,
or if the "From:" header contains "viagra" or "pfizer", then we'll move
it to spam and exit.  If the "Subject:" header contains either "viagra"
or "cialis", we'll spam it and exit.  Finally, if we get to the end of
the script, the message is accepted.

Configuring getlessmail as a getmail filter
-------------------------------------------

Add the snippet in the provided getmailrc to your ~/.getmail/getmailrc:

[filter]
type = Filter_external
path = /path/to/getlessmail.rb
arguments = ('username',)
unixfrom = true

Change the path to refer to where you have installed the
getlessmail.rb script.  Make sure getlessmail.rb is executable.
Replace 'username' with your username.  The unixfrom must be true,
or else the "From " headers will be missing if you copy or move the
message to another folder.  If you don't intend to use the copy or move
functions, then you don't really need unixfrom.  If getmail will only be
run under your account, you can omit the 'arguments' line entirely -- the
current user is assumed.

You can test the filter by sending yourself an email, then execute
getmail.  If you include the "addheader" line from the example script,
then you can examine the message headers to make sure that the filter
was applied.

You can also pipe content directly to getlessmail.rb to see what comes out
the other end.  Test out each of your rules to be sure it works the way you
think it does.


filter_mbox.rb -- Filter an entire mbox file using getlessmail.rb
-----------------------------------------------------------------

If you'd rather not hold up getmail while getlessmail does its filtering, you
could configure getmail to deliver to an "unfiltered" mailbox, then use the
supplied script filter_mbox.rb to filter the messages into your Inbox.

usage: filter_mbox.rb [-v] < mbox_in >> mbox_out

For example, here is my 'ckmail' script, which I run from cron every 15 minutes,
and can also run on demand:

#!/bin/sh
if ! /bin/pgrep -f $0
then
  /usr/local/bin/getmail $* -rgetmailrc
  ~/bin/filter_mbox.rb $* < ~/Mail/Unfiltered >> ~/Mail/Inbox
  > ~/Mail/Unfiltered
else
  echo "$0: already running" >&2
  exit 1
fi

This script first makes sure that it isn't already running, then it uses
getmail to deliver messages to a file called "~/Mail/Unfiltered" (specified
in the getmailrc file).  Next, it uses filter_mbox.rb to copy the good
messages to the Inbox, then wipes out the ~/Mail/Unfiltered file.


glmadd.rb -- Command-line tool
------------------------------

Rather than manipulating the .getlessmail file directly, you may want to use
glmadd.rb -- especially when the process is automated.  This script offers
the following command line syntax:

usage: glmadd.rb [address] [-(a|k|K|s|S)]

-a = ask for option
-k = keep if from this address
-K = keep if from the address' domain
-s = spam if from this address
-S = spam if from the address' domain

If you leave off the address (i.e., call this script with no arguments), it
will ask for the email address to be entered on stdin.

When you use the -a switch (or omit all switches), glmadd.rb will ask you to
enter k, K, s, S, or i which represent "keep account", "Keep domain",
"spam account", "Spam domain", or "ignore", respectively.

Example:

	glmadd.rb spammer@spamfactory.org -S

Adds the the following rule to your ~/.getlessmail script:

	spam if from /\b@spamfactory\.org\b/i


glmpipe.rb -- Pipe filter for extracting sender
-----------------------------------------------

With some Mail User Agents (notably mutt), you have the option to pipe a
message through a shell command.  The script glmpipe.rb is designed for
that purpose.  It extracts the sender's email address and then uses
glmadd.rb to add it to the .getlessmail filter.

usage: glmpipe.rb [-(a|k|K|s|S)]

Each of the switches has the same meaning as for glmadd.rb, to which it
will be passed.

You can then create macros to automate the process of blacklisting or
whitelisting the sender's email address or domain.  For instance, I define
the following in my .muttrc:

macro index M "|glmpipe.rb -" "Add sender to .getlessmail"
macro pager M "|glmpipe.rb -" "Add sender to .getlessmail"

This waits for me to type the desired switch and press Enter.  However,
you could also map individual keys to individual functions.  For example:

macro pager \Ct "!glmpipe.rb -s\n" "Spam sender"

would map ^T to nail the sender as a spammer and ask questions later (or
rather, not at all).


IPGeo.rb -- Extract IP and IP location
--------------------------------------
This little module contains a class with two methods:

IPGeo.get_ip(string) => int

Scans string for a dotted IP address and returns an int representing that
address.

IPGeo.locate(int) => [owner, city, st, country, email] or nil

Takes an IP address as an integer and finds the owner information in a
colon-delimited database (/usr/local/share/IPGeo/IPGeo.csv), returning
an array of owner information, or nil if the IP is a reserved local IP or
is not in the database.

The file IPGeo.csv is an example of this database, downloaded from
http://linuxbox.co.uk/ip-address-whois-database.php


IPGeoMail.rb - Extract IP Location from "Received:" header fields
-----------------------------------------------------------------
This module contains one method:

locate_from => [owner, city, st, country, email] or nil

This method is intended to be called from within a .getlessmail script
to scan all "Received:" header fields and return the IP Location
information of the innermost (last) such field for which the information
can be obtained.  This will be the closest to the originator of the
message.