Cryptography Lessons
Tracy r reed   |  

Spam Filtering and Internet brokenness

Every now and then I fire off a lengthy rant to someone which all too
often is only read by me and the other person. Sometimes I get to
inflict my rants on a mailing list but that is only sometimes. Now that
I have a webpage where I can post stuff like this I am going to be
copying and pasting any interesting rants here as well. Today I sent an
email to a webserver administrator complaining about how they give the
user the option of using SSL or not with the justification being that
SSL is slower. I think this is silly so I sent them an email suggesting
the just make SSL the default and make things simpler since any speed
difference is very negligable, especially on modern hardware. My email
was bounced back! Apparently their mail server uses a DNS based block
list (often called an RBL or Realtime Blackhole List) which is a rather
controversial setup. I emailed them from a different account to which
they replied. My followup with them is as follows:

On Sun, Jan 09, 2005 at 09:03:32AM -0800, David Timm spake thusly:
> 1. We block about 85% of our incoming email at the client level. This has
> saved thousands of dollars in bandwidth costs and stops most of the big
> volume of Spam -- zombie machines sending messages from subscriber networks.
> It has been extremely effective for us and I think it is a really good idea.
> I know there are two major camps in this war -- one that accepts then scans
> the email and the other that blocks first. We choose the latter. I have
> white listed your server, which should eliminate any further problem for
> you. If you do have any further problem, you can also send messages to us
> via the web at: or click 'contact
> technical support' at the bottom of any schedule master page.

I do this too. However to block solely on the basis of someones IP being
in a list is a very bad idea. I learned this lesson the hard way:

Summary: An otherwise respectable RBL shut down in a silly way and caused
mail servers all over the world to start bouncing ALL mail.

There are actually three camps in this war: The two you mentioned plus
those who use something like spamassassin.  The proper way to block spam
(IMO) is to use something like spamassassin which calculates a score based
on a number of factors including whether the IP appears in a list. For
example, the email you sent to me scored like so:

X-Spam-Status: No, hits=-4.9 tagged_above=-999.0 required=5.0 tests=BAYES_00

So it did not have any spam like qualities at all. Your IP wasn't on any
lists, the content of the email did not look like spam according to the
bayesian filter, and there was no other funny business going on. It
actually had a few positive things going for it which made the spam score
negative. I have my system configured such that a score of 5 is required
to be labelled as spam and sorted into the junk folder. I never outright
reject mail because false positives (such as when I emailed you) can be
injurious to a business. But a really spammy email looks like this:

X-Spam-Status: Yes, hits=38.6 tagged_above=-999.0 required=5.0 tests=BAYES_99,

This email has many words in common with spam according to the bayesian
filter, the date was screwy, the MUA was forged, The RCVD line was forged,
etc. etc. PLUS it was listed in a number of lists such as Dynablock and
SORBS. Every so often I go through my junk folder and casually glance over
the emails to make sure there are no false positives (I have found 3 in
the past year and a half and they were from a very suspiciously configured
mail client so it's debatable but overall an astoundingly good rate) and I
get perhaps a couple of actual spam emails through to my inbox with about
300 being blocked per day for an accuracy rate of 99.6% and a false
positive rate of much lower.

Last weeks spam stats:

[root@copilotconsulting log]# grep "Yes," maillog.1 | wc -l
[root@copilotconsulting log]# grep "No," maillog.1 | wc -l

So 20830 emails of which 7938 were spam. I have a lower spam ratio than
most because I am on a LOT of legitimate but very high traffic mailing
lists (such as the linux-kernel mailing list) which boosts the amount of
good traffic I get. So I have the best of both worlds: No arbitrary lists,
very little spam or false positives.

> 2.  SSL is slower (google 'ssl performance') but my concern might be a bit
> dated with today's newer hardware.  I'm sure you are aware that the browser
> and server both take a performance hit.  It may be small enough to default
> to ssl now.  I'll consider that.   Thanks for calling it to our attention.

Certainly there is some sort of performance hit when using SSL but I have
administered an enabled SSL web server since 1997 (was Netscape Server
back then although I have been all Apache for years now) and even back
then I didn't notice any significant performance hit using hardware and
browsers of the day. Even doing simple RSA operations with PGP was pretty
quick. And of course the time to use a symmetric encryption algorithm such
as that used by SSL on something as small as a webpage is miniscule as
well.  We really need to encourage a culture of computer security and
secure defaults if we ever expect to improve the current miserable
computer security situation.

Tracy Reed
This message is cryptographically signed for your protection.

On Sun, Jan 09, 2005 at 10:41:21PM -0800, David Timm spake thusly:
> Hi Terry,  I'll look into Spamassassin -- I've been meaning to, and your
> message gives me another push, thanks. I've been reluctant to use anything

You are welcome. FWIW I use the killer combination of
postfix+amavisd-new+spamassassin+clamav for my spam and virus scanning
needs. I also use something called my_rules_du_jour (which I think may be
an add-on, not sure if it came with spamassassin or not) run from a
cronjob nightly to keep my spamassassin rules up to date. clamav also has
a daemon which definitely does come with it called freshclam (heh) which
keeps the virus definitions up to date. I never would have thought I would
see a good open source virus scanner just because of how boring it would
be to keep the definitions up to date but the clamav guys do an impressive

I don't recall if this is the exact howto I used to set it up but it was
pretty easy:

Tracy Reed
This message is cryptographically signed for your protection.