We haven't connected OpenDKIM to Postfix yet because we're going to
save that step until the end and connect all of the milters at the same
time. Now it's time to turn our attention to the second milter we're
going to install:
spamass-milter, which will provide us with our anti-spam solution, SpamAssassin.
First, install the milter. The package will install SpamAssassin and also
the daemonized version of SpamAssassin that we'll actually be using. At
the same time, we're also going to install the Razor and Pyzor
anti-spam add-ons, along with an updated perl library that SpamAssassin
needs in order to properly interpret DKIM headers (which it needs to be
able to do as part of its evaluation of a message's spamminess):
There are a number of different ways to set up and run SpamAssassin.
The simplest, which we're not doing here, is to ignore all the fancy
daemons and have your mail transfer agent (MTA) actually relay messages
through SpamAssassin before sending them to Dovecot for delivery. There
is some value in this approach, and we'll talk about it briefly in part 4
as part of our wrap-up. What we're going to do, though, is have Postfix
slip e-mail to
spamd via a unix socket.
runs in the background and keeps worker processes ready to receive and
check e-mail for spam. It's mostly preconfigured by our milter install,
and it's marginally faster and marginally less resource-intensive than
relaying—you don't have to run through SpamAssassin's startup process
every time you receive a new e-mail this way, and if your server is
really busy, this can have an impact.
In its default configuration, the
creates separate spam filter preferences and Bayesian databases for
each local mailbox. You may or may not want this, depending on what you
want to do with your server; if each inbox is used for a different
purpose, then broadly trying to keep a single spam database might worsen
SpamAssassin's detection rate. In this guide, we're going to tell
SpamAssassin to not create individual user preferences and databases and instead go with a single global setup.
User accounts and config files
spamass-milter gets us most of the way to where we
need to be with SpamAssassin, but we have to lay some additional
groundwork to get it production-ready. The package creates a
"spamass-milter" user to run the actual milter process, but it doesn't
create a user for
spamd, which runs as root. We don't want that, so we need to create a "spamd" user to give
a context under which to execute. We also need to add the existing
spamass-milter user to the new spamd user's group, because
spamass-milter needs to be able to write to the unix socket that the
spamd user will be creating:
There are several config files we will be concerning ourselves with.
Several are for the core SpamAssassin tool itself, and then there's one
controlling settings specific to the milter and the
/etc/default/spamassassin for editing. It
already has a few things in it, but we're going to change all of the
defaults with the exception of the pidfile variable. Change the file so
that it looks like this:
The first two items are variables, which we'll use to define our SpamAssassin helper application home directory (at
/var/lib/spamassassin), and the directory which holds SpamAssassin's global configuration. Next, we actually enable the
spamd daemon. The
section lays out a bunch of stuff: we disable per-user configuration
files, specify the UID and GID under which SpamAssassin should run,
specify our global config file path (via variable), set the unix socket
spamd will listen for incoming mail to be passed
to it from Postfix, and a few other things. We're also turning on
SpamAssassin's automatic rule updates with the
We have to manually create the socket directory, too, so let's create it and set its owner:
Next, open the
spamass-milter config file, at
There should only be a single uncommented line in there, the "options"
line, which we want to modify so that it looks like this:
The "-u" option to specify the milter's account context is already
there, but we're adding several things, mostly to tell SpamAssassin what
messages it can ignore (i.e., messages going to and from LAN hosts).
-i 127.0.0.1 option tells the milter to ignore messages originating from our server;
-I ignores messages from authenticated SMTP senders. The
option keeps SpamAssassin from messing around too much with the message
headers. The last bit with all the dashes is there to tell the milter
where it should be sending mail for spam scanning—in this case, we point
it at the socket we configured a moment ago for the
Now we're going to make a couple of changes in the core SpamAssassin config files, which have been set up in
/etc/spamassassin by the main
spamass-milt package installation. We're going to edit two of the files.
First, open up
/etc/spamassassin/init.pre and add the following line at the bottom:
This will cause the TextCat plugin to be loaded by SpamAssassin when it spins up, which is required for one of the config options we're going to enable in the next file.
/etc/spamassassin/local.cf and add these lines at the top, just under the first comment block:
As noted in the beginning of this section, we're going to keep a
single Bayesian database and token set for all of our virtual mail
users; this global option is set here, with the
Below that are two options you may or may not want to set, depending on your goals (and your location).
tell SpamAssassin to activate some additional filtering rules that will
trigger on e-mails that don't match the listed languages and that don't
match the listed character sets. SpamAssassin isn't always going to be
able to recognize the language of the e-mails it scans. You can
obviously modify these settings to taste (maybe you only want to receive
Italian e-mails!), or leave them out entirely; however, if you don't
plan on ever needing to receive e-mails in anything other than your
native language, these options can help cut down on foreign language
spam. The full list of languages is available in the TextCat documentation, and the available locale list is in the main SpamAssassin config file docs.
The last two bits are for Pyzor and Razor, which we installed at the beginning of our SpamAssassin adventure but haven't configured yet. We'll get to that in just a moment.
Any time you edit SpamAssassin configuration files, it's always a good idea to check the files for errors with SpamAssassin's lint option. Go ahead and run that now, like this:
If you get just a new line and no other output, then everything's fine.
Now is also a good time to run the
which will grab the latest SpamAssassin rules. This will be done
automatically by SpamAssassin, but we want to do it manually for the
first time so that we can make sure it stores everything in the right
place. If you're running as root, you'll want to also fix the ownership
on the rules so that the spamd user can modify them:
Now that we have all of this set, we need to actually bounce both the
processes to turn all of our configuration stuff on—but first, we must
create the global Bayesian filter directory we just set in
local.cf. So create the directory, then fix its ownership so that it belongs to the spamd user and group.
The last thing before we restart the two services is to make sure that the SpamAssassin milter user can write data to the SpamAssassin socket, which we'll do by swizzling its group membership appropriately; we'll then restart the services:
Razor2 and Pyzor
Before we can call our SpamAssassin setup job complete, we need to get Razor2 and Pyzor working. These two separate applications will function as add-ons to SpamAssassin. They broaden your spam detection powers by checking the hashes of incoming messages against separate sets of ever-expanding spam databases. They're essentially configure-and-forget, so it's worth taking the time to get them working now.
First, we need to create several directories underneath
/var/lib/spamassassin, since that's where all of our SpamAssassin helper app settings are going to live:
First, Pyzor. Per the main Pyzor setup documentation, there's just a single command to run to get Pyzor to bootstrap itself:
Next, Razor. There are a few commands to run to get Razor properly registered and operational:
We also need to edit
/var/lib/spamassassin/.razor/razor-agent.conf and explicitly tell Razor what directory its config files live in. Add this line to the file:
And that does it for SpamAssassin! The very last things to do are to ensure that the spamd user and group own everything in
/var/lib/spamassassin and then restart the
SpamAssassin care and feeding
SpamAssassin as we have it configured is going to be pretty good at stopping spam right out of the box (especially when coupled with our extremely aggressive Postfix config). However, the key to making SpamAssassin really shine is training it. Right now, SpamAssassin is like the T-800 in Terminator 2 right after they set his chip to read/write: it's ready to start learning, and you have to be John Connor.
SpamAssassin learns in two ways: on its own, and with your help via the
sa-learn tool. The self-learning is automatic and you don't need to do anything—just look at the "autolearn"
portion of the "X-Spam-Status:" headers on incoming e-mails and you'll
see at the end whether or not SpamAssassin found anything to learn from
However, at least at first, you'll want to take an active hand in
training SpamAssassin's Bayesian database. This is a relatively simple
process: get a bunch of e-mail that you know for sure isn't spam—like
your inbox—and run
sa-learn on it, like this:
SpamAssassin's Bayesian filter will go to work trying to understand the types of messages that you normally want to keep.
But you also need to train SpamAssassin on spam, too. The process is the same, except instead of using the
--ham flag and pointing to a directory full of good mail, you use the
--spam flag and point it at a directory you know contains nothing but spam:
The learning processes are most effective when you have a corpus of
about 1,000 messages of both spam and ham to train from. In fact,
SpamAssassin won't even start using its Bayesian filters without at
least 200 pieces of spam in its database. If you're not going to be
dumping in your mail from somewhere else, then it might take a bit of
time to amass 200 pieces of spam, let alone 1,000. However, at least for
spam, you can always "borrow" some from another e-mail account. For
example, at this exact moment I have 340 spam e-mails sitting in my main
Gmail account's Junk folder. You can export messages from any Gmail
folder (including Junk) using Google's Takeout feature.
So simply request an archive of your Gmail junk folder, download it,
and uncompress it somewhere on your server (you can put it anywhere),
and then run
sa-learn --spam on it. When the process is
over, you can delete the downloaded folder. (And remember, if you do
this as root, you'll want to make sure that the spamd user ends up
owning all the files created in
/var/lib/spamassassin/.spamassassin when you're done.)
Before we leave the topic, it's also important to understand that
can easily learn spam as ham and vice versa if you're not careful with
it—that's why you should only run it on collections of mail that you've
carefully vetted. Otherwise, it might let spam through or mark mail as
spam that shouldn't be. If you ever accidentally train it wrong, you can
--forget option on specific pieces of mail or directories full of mail to make it un-learn those incorrectly marked messages.
Virus scanning with ClamAV
Virus scanning is an optional component. On one hand, if you intend to use your mail server to provide inboxes to less-computer-savvy family members, there's likely quite a bit of value in having something set up to scan both incoming and outgoing e-mails for malicious attachments. On the other hand, ClamAV is a memory hog and most (but not all) of the things it's going to catch will also be caught by SpamAssassin, since the majority of bad attachments you're going to receive are almost certainly going to be spammy in nature.
Still, the only thing you have to lose in running ClamAV is RAM, and RAM is pretty cheap. If you want to leave it out or substitute in a different virus scanner, feel free. We're going to install it, though.
Just like OpenDKIM and SpamAssassin, we're going to install ClamAV as a milter so that Postfix can communicate with it via unix sockets in the pre-delivery stage, before it hits Dovecot for delivery. Along with ClamAV's milter package, we also need to install a whole bunch of command line decompression utilities in order to give ClamAV the ability to scan inside of compressed attachments. Run the following command to get everything installed:
We're going to make two changes to the ClamAV milter's configuration file. Open
/etc/clamav/clamav-milter.conf for editing and find the following three lines:
Comment them out (by putting a "#" at the beginning of the lines) and replace them with these two lines instead:
The first option changes where the milter creates the socket on which it listens for e-mail from Postfix. As with the other milters, its socket needs to be inside of Postfix's chroot jail. And as with the other milters, we need to create the directory we're specifying and change its owner appropriately:
The next option allows messages with infected payloads to actually be delivered instead of holding them in Postfix's queue. The last tells ClamAV to use Postfix's log files, which helps to minimize the number of places you have to look when hunting down problems.
Why do we want infected mail delivered, though? There are two main reasons. The biggest reason is that we're going to rely on Sieve as our One True Filter for both spam and viruses. Rather than having SpamAssassin rejecting spam and ClamAV hiding viruses, we'll funnel all the filtering through that central tool, which simplifies management and gives you tons of flexibility. How, then, do we set up Sieve? We'll get to that in a few sections.
First, though, we need to finish ClamAV's setup—which, mercifully,
we're almost done with because there's a lot less to do than with
/etc/default/clamav-milter and uncomment the following line:
We need this option in place so that Postfix can write data to the ClamAV milter's socket. After this option is set, restart both the ClamAV milter and also the main ClamAV daemon:
The ClamAV milter package thoughtfully includes
which automatically keeps ClamAV's virus definition files up to date.
The last thing you'll want to take a look at before we finish with
ClamAV is the
freshclam log file, to make sure that it has
downloaded the latest update from the ClamAV servers. It might take
several minutes after installation to properly update, but if it's
working properly, you'll see something like this in