E-mail filtering for spam fighting
We've reached the final section of part 3! The last thing we're going to do here is set up some quick e-mail filters with Sieve, the Dovecot server-side mail filtering plug-in. Sieve lets you set up filters based on message headers, the contents of the message, or really anything at all—it's very flexible, but the scripting language it uses (defined in IETF RFC 5228) is a little opaque if you're not a scripting pro.
There are two very important reasons why we're bothering with Sieve: 1) We're going to use it to actually do something with messages that SpamAssassin marks as spam, and 2) we're going to use it to actually do something with messages ClamAV marks as infected with viruses. It's also extremely awesome to use it to quickly set up filters for e-mail you don't want to receive. Forgot to use a throwaway address to register from somestupidbusiness.com's website? That's OK: add a quick Sieve rule and mail from *@somestupidbusiness.com is sent to the trash—or silently destroyed.
The good news about Sieve is that you don't have to install it—or, rather, it's already installed as part of the big mail stack package we ran back in part one. If you do a quick
dovecot -n to check, you'll see several things referring to "sieve" in the list of filters:
Sieve looks for filters in a number of places. Each virtual user can
have their own filter rules under their virtual mail home directory
/var/mail/vmail/yourdomain/username/sieve), and there are also some global filters defined, in
sieve-after. Any filters in
sieve-before are executed prior to per-user filters, and filters in
sieve-after are run after.
I put my global filters in
we're going to handle spam and virus filtering, as well as a few other
things. Go ahead and create both the before and after directories and
make sure they're owned by the "vmail" virtual mail user, like this:
Now it's time to make some filters! We're currently marking but not doing anything with spam and virus-laden e-mails—both our SpamAssassin milter and our ClamAV milter are setting notes in the message headers but are otherwise passing the messages on for delivery without complaint, so let's build some filters to fix that.
Pop open a file for editing in your newly created
sieve-before directory—you can call it something like
masterfilter.sieve or really anything you like, so long as it ends in
.sieve. Here are the contents of my primary filter, which you can modify as you like:
There are quite a few things going on in here. The very first thing we have to do is include the various Sieve modules that our filter is going to require—in this case, we're loading up "envelope", "fileinto", "imap4flags", and "regex"—there are quite a few modules in all, but these are the ones we need.
It's very important to understand that like a lot of rule-based programs, Sieve runs your filters sequentially, from top to bottom. In other words, where you place your rules in the file matters!
For example, I have a rule to discard messages without proper message IDs because it's yet another way to cut down on spam. However, Comcast's billing notifications don't have correct IDs (way to go, Comcast). So, the very first thing that the filter above does is check to see if a message is coming in from Comcast (or from TXU, my electricity company).
That first rule isn't just a simple check, either—it uses an "OR" type test. The
if anyof triggers if any of the listed conditions are true and then takes the specified action. For this rule, I'm checking if the envelope sender
either contains "@alerts.comcast.net" or exactly matches
"email@example.com". You can filter on any part of a message, but the
"envelope sender" is actually added by the remote MTA rather than by the
remote mail user agent (MUA)f, so it's more likely to be a reliable
indicator of who and where the e-mail is actually from.
The filter also specifies an action—in this case,
which means the message is delivered into the "Bills" folder for the
receiving user. There are other actions possible, too, and we'll touch
on those in just a moment.
The last thing in the initial filter is a
which tells Sieve that if a message matches one of these conditions, it
doesn't have to bother running the rest of the tests on it.
Our second filter is one of two spam traps, keying off of the spam "score" assigned by SpamAssassin and written into the message's headers. By default, anything that rates over a 5 is considered spam, but some messages are really, really spammy—it's not uncommon for some spam to rate 15 or even 50 and above, depending on the number of SpamAssassin rules it triggers. Lower-scoring spam might actually be misidentified ham (i.e., legit messages), so we don't want to automatically discard everything that SpamAssassin marks. On the other hand, past a certain rating, a message is almost certainly for-real spam and we don't need to keep it around.
Our second filter checks messages for a header labeled
"X-Spam-Level," and if that header contains at least ten asterisks (one
for each spam rating level), we don't even bother filing the message—we
it. This causes Sieve to silently delete the message, vanishing it into
the bit bucket without sending a note back to the sender. The action
will be logged in
/var/log/mail.log, but other than that, you'll never hear about it.
Next, a filter to take care of Twitter's spam. Try as I might, I can't get them to leave me alone, so this filter will silently throw away their advertising crap while leaving password reset notifications alone (they come from a different envelope sender).
Below that is a filter to discard messages with malformed message IDs. This takes care of eliminating some spam, but it might also destroy legitimate messages from folks with stupid MUAs or MTAs (older versions of Exchange, I'm looking at you). It's an optional filter, but I'm leaving it in both because I use it, and also because it demonstrates a regular expression-based filter statement. Sieve will happily use regexes as long as they conform to IETF RFC 5228's syntax requirements.
Lastly, a compound filter to take care of low-scoring spam and virus-laden e-mail. I want to hang on to any piece of spam that rates lower than 10 because it might be mislabeled ham; for messages like that, you'd want to rescue them and run sa-learn on them (first to un-learn them, then to re-learn them as ham) to make sure SpamAssassin learns from its mistake. I also want to hang on to messages that ClamAV tags as infected because it's always possible a friend or family member will unknowingly send a virus; keeping the message around means I can at least let them know that they have a problem.
The filter checks to see if a message has either an X-Spam-Level of 5
or a ClamAV infected filter, then files it into the "Junk" folder (and
marks it read with
setflag "\\Seen" if it has the spam
header and into the "Infected" folder if not. Doing it this way also
means that spam e-mails with viruses will get treated as spam; only
infected messages SpamAssassin hasn't marked as spam will get filed into
the "Infected" folder (which you'll have to create manually since I
didn't include it in the instructions).
Sieve filters can be extremely rich and complex, and I've barely scratched the surface of what they can do. There are some excellent Sieve example pages out there (especially this one and this one) with tons of great tutorials and filters to crib from.
If you want to test out your rules on some messages to see exactly what's happening, you can use the
command, with some verbose options on the end of it. For example,
running my ruleset against an e-mail from Comcast I have in my inbox
yields the following:
Once you have your Sieve filter set just the right way, you'll need to run the
sievec command on it to transform it into a binary
file. This is so that the Sieve Dovecot plug-in can parse it more
quickly (and it will also check your syntax and tell you if you have any
errors). This is done automatically for Sieve filter files stored in
local user directories, but you'll need to do it manually for filter
files you place in
sieve-after. The command is simple:
If you do this as root, make sure to then
chown the resulting
.svbin file to
And that's part 3! At this point, your e-mail server is essentially production-ready. It should still be firewalled off from the rest of the world, but if you want to unblock the four relevant ports (SMTP, SMTPS, IMAP, and IMAPS), then your server is live. However, you might want to hold off on that until part 4, because we're going to toss in some rate limiting protection there with iptables.
If you're testing on your LAN, before you open things up to the world you should consider quickly adding your account to your local mail application to see if everything is working correctly. You'd do it just like you do any mail account—in the app of your choice, just add an account and supply the appropriate e-mail address and password.
In part 4, we'll tackle getting webmail set up with all the complications that brings. We'll also drop some additional security restrictions in place, both for your e-mail server in general and also quite a bit of stuff specifically around webmail (including two-factor authentication, application-specific passwords, and certificate-based logins). Part 4 will be the final one in our series, and you'll wind up with a nice little server that should be able to easily handle your mail while standing up to the big bad Internet.