Federal Linux

Spam Filtering and Internet brokenness

Tracy R Reed  | 

Every now and then I fire off a lengthy rant to someone which all too often is only read by me and the other person.

Sometimes I get to inflict my rants on a mailing list but that is only sometimes. Now that I have a webpage where I can post stuff like this I am going to be copying and pasting any interesting rants here as well. Today I sent an email to a webserver administrator complaining about how they give the user the option of using SSL or not with the justification being that SSL is slower. I think this is silly so I sent them an email suggesting the just make SSL the default and make things simpler since any speed difference is very negligable, especially on modern hardware. My email was bounced back! Apparently their mail server uses a DNS based block list (often called an RBL or Realtime Blackhole List) which is a rather controversial setup. I emailed them from a different account to which they replied. My followup with them is as follows:

On Sun, Jan 09, 2005 at 09:03:32AM -0800, David Timm spake thusly:
> 1. We block about 85% of our incoming email at the client level. This has
> saved thousands of dollars in bandwidth costs and stops most of the big
> volume of Spam — zombie machines sending messages from subscriber networks.
> It has been extremely effective for us and I think it is a really good idea.
> I know there are two major camps in this war — one that accepts then scans
> the email and the other that blocks first. We choose the latter. I have
> white listed your server, which should eliminate any further problem for
> you. If you do have any further problem, you can also send messages to us
> via the web at: http://answer.timesync.com?action=contact or click ‘contact
> technical support’ at the bottom of any schedule master page.

I do this too. However to block solely on the basis of someones IP being in a list is a very bad idea. I learned this lesson the hard way:

http://www.e2ksecurity.com/archives/001028.html

Summary: An otherwise respectable RBL shut down in a silly way and caused mail servers all over the world to start bouncing ALL mail.

There are actually three camps in this war: The two you mentioned plus those who use something like spamassassin. The proper way to block spam (IMO) is to use something like spamassassin which calculates a score based on a number of factors including whether the IP appears in a list. For example, the email you sent to me scored like so:

X-Spam-Status: No, hits=-4.9 tagged_above=-999.0 required=5.0 tests=BAYES_00

So it did not have any spam like qualities at all. Your IP wasn’t on any lists, the content of the email did not look like spam according to the bayesian filter, and there was no other funny business going on. It actually had a few positive things going for it which made the spam score negative. I have my system configured such that a score of 5 is required to be labelled as spam and sorted into the junk folder. I never outright reject mail because false positives (such as when I emailed you) can be injurious to a business. But a really spammy email looks like this:

X-Spam-Status: Yes, hits=38.6 tagged_above=-999.0 required=5.0 tests=BAYES_99,
BigEvilList_92, DATE_IN_PAST_96_XX, DATE_SPAMWARE_Y2K, DCC_CHECK,
FORGED_MUA_OUTLOOK, FORGED_RCVD_NET_HELO, HTML_90_100, HTML_IMAGE_ONLY_02,
HTML_MESSAGE, KOREAN_UCE_SUBJECT, MIME_HTML_ONLY, MIME_HTML_ONLY_MULTI,
MISSING_MIMEOLE, NORMAL_HTTP_TO_IP, RCVD_IN_DYNABLOCK, RCVD_IN_SORBS,
SUBJ_ILLEGAL_CHARS

This email has many words in common with spam according to the bayesian filter, the date was screwy, the MUA was forged, The RCVD line was forged, etc. etc. PLUS it was listed in a number of lists such as Dynablock and SORBS. Every so often I go through my junk folder and casually glance over the emails to make sure there are no false positives (I have found 3 in the past year and a half and they were from a very suspiciously configured mail client so it’s debatable but overall an astoundingly good rate) and I get perhaps a couple of actual spam emails through to my inbox with about 300 being blocked per day for an accuracy rate of 99.6% and a false positive rate of much lower.

Last weeks spam stats:

[root@copilotconsulting log]# grep "Yes," maillog.1 | wc -l
  7938
[root@copilotconsulting log]# grep "No," maillog.1 | wc -l
 12892

So 20830 emails of which 7938 were spam. I have a lower spam ratio than most because I am on a LOT of legitimate but very high traffic mailing lists (such as the linux-kernel mailing list) which boosts the amount of good traffic I get. So I have the best of both worlds: No arbitrary lists, very little spam or false positives.

> 2. SSL is slower (google ‘ssl performance’) but my concern might be a bit
> dated with today’s newer hardware. I’m sure you are aware that the browser
> and server both take a performance hit. It may be small enough to default
> to ssl now. I’ll consider that. Thanks for calling it to our attention.

Certainly there is some sort of performance hit when using SSL but I have administered an enabled SSL web server since 1997 (was Netscape Server back then although I have been all Apache for years now) and even back then I didn’t notice any significant performance hit using hardware and browsers of the day. Even doing simple RSA operations with PGP was pretty quick. And of course the time to use a symmetric encryption algorithm such as that used by SSL on something as small as a webpage is miniscule as well. We really need to encourage a culture of computer security and secure defaults if we ever expect to improve the current miserable computer security situation.

— Tracy Reed http://copilotcom.com This message is cryptographically signed for your protection. Info: http://copilotconsulting.com/sig

On Sun, Jan 09, 2005 at 10:41:21PM -0800, David Timm spake thusly:
> Hi Terry, I’ll look into Spamassassin — I’ve been meaning to, and your
> message gives me another push, thanks. I’ve been reluctant to use anything

You are welcome. FWIW I use the killer combination of postfix+amavisd-new+spamassassin+clamav for my spam and virus scanning needs. I also use something called my_rules_du_jour (which I think may be an add-on, not sure if it came with spamassassin or not) run from a cronjob nightly to keep my spamassassin rules up to date. clamav also has a daemon which definitely does come with it called freshclam (heh) which keeps the virus definitions up to date. I never would have thought I would see a good open source virus scanner just because of how boring it would be to keep the definitions up to date but the clamav guys do an impressive job.

I don’t recall if this is the exact howto I used to set it up but it was pretty easy:

http://mail.x-si.org/articles/av.html

— Tracy Reed http://copilotcom.com This message is cryptographically signed for your protection. Info: http://copilotconsulting.com/sig