Today I fundamentally changed the way that I cope with spam.
On Sat 02/03/07, I put a “Contact Clock Tower Law Group” contact form on my blog. Previously the contact form lived only on the Clock Tower Law Group (CTLG) website. But my blog gets much more traffic (about 100 times more) than the CTLG website, and it also has more articles. So rather than hope that people would find an article, hope that they’d click through to the CTLG website, and then hope that they’d fill out the contact form, I decided to put the contact form on the blog itself.
Immediately after launching the new contact form, we started getting inundated with spam from the form (FormMail spam). The CTLG website has only about 20 pages in Google’s database, my weblog has about 2000 pages. I think it’s safe to say that FormMail spam increased by a couple of orders of magnitude (i.e. in proportion to each site’s traffic). To make matters worse, none of the spam was being flagged as spam by SpamAssassin, because SpamAssassin primarily looks at the email headers, and email message that come from NMS FormMail have similar headers (since they are “originating” from my server).
On Mon 02/12/07, I entered a new custom rule (a “meta rule”) on SpamAssassin’s configuration file (/usr/local/etc/mail/spamassassin/local.cf
) on the server:
describe _MY_FROM_FORMMAIL Is a FormMail submission header _MY_FROM_FORMMAIL X-Mailer =~ /NMS FormMail/ describe _MY_HAS_LINKS Has HTML links rawbody _MY_HAS_LINKS /a href/ describe MY_FROM_FORMMAIL_HAS_LINKS Forms with HTML are likely spam meta MY_FROM_FORMMAIL_HAS_LINKS (_MY_FROM_FORMMAIL && _MY_HAS_LINKS) score MY_FROM_FORMMAIL_HAS_LINKS 17.0
The new rule (which looks for HTML email submitted via NMS FormMail) now correctly marks FormMail spam as such, prepending “[SA]” to the Subject line of each FormMail spam.
But simply marking FormMail spam as such turned out not to be enough. All email that gets submitted to the form goes to all of us at CTLG and also to a special mailbox that integrates with our FileMaker database. In this way, we can automatically populate the database with information about those who are interested in our services. But now I had a new problem. Database spam. Since all email was being delivered to all of us and to the database, I ended up checking all of the spam (for “false positives”) twice. One in Eudora, once in FileMaker.
So I decided to fundamentally change the way that I process spam.
Previously, I let SpamAssassin simply mark each message (prepending “[SA]” to Subject line) and deliver it as normal. Eudora’s filters plus my own screening did the rest. Another option is to mark the email as spam but to leave in on the server (for example, in /var/mail/spam
). So that’s what I chose to do.
This is a fundamental shift in my thinking and a milestone day for me. Today, I am more concerned about minimizing the time that I spend plowing through spam looking for false positives than I am about losing a legitimate email to my spam filters.
In one way, it’s a victory for spam, because I have finally become overwhelmed by it. In another way, it’s a defeat for spam, because now I’ll not see 99% of it. Depends how you look at it. And I don’t want to look at it, thank you very much.
Now my spam happily resides on the server. I now, of course, had a new problem. How (and I suppose, whether) to check for false positives. Every now and again, somebody asks if I got a particular message. I’ll search my local email, and, sure enough, it’s been incorrectly marked as spam folder. It doesn’t make sense to simply periodically download the spam from the server with a pop mail client – that would only reintroduce all of the previous problems. I don’t want to send it to Gmail or Yahoo Mail because 1) I don’t pay them and 2) I don’t trust them. So I decided to install a webmail application on my server so that I could periodically check the spam mailbox for false positives. While leaving the spam on the server.
On the evening of Mon 02/12/07, I tested two open source webmail program: SquirrelMail and OpenWebMail.
- SquirrelMail requires PHP and an IMAP server. SquirrelMail integrates simply with multiple domain names (virtualhosts) and directly modifies Apache’s config file (
httpd.conf
). SquirrelMail works well as a text mail reader, not so well as an HTML mail reader. The UI feels very web 1.0 (if you know what I mean). - OpenWebMail requires an Apache web server (with CGI) and Perl. OpenWebMail supports multiple domain names, but it is tricky to get it working properly. OpenWebMail works well as an HTML mail reader, not so well as a text mail reader. The UI uses JavaScript and feels like it is trying to be a viable replacement for MS Outlook (and similar email clients).
SquirrelMail is more popular than OpenWebMail, according to Google Trends, but both appear to be decreasing in popularity. I speculate that this is due to free hosted webmail services such as Gmail and Yahoo Mail, both of which are increasing in popularity. Plus there’s that whole “free” thing.
Now I can use OpenWebMail to access the spam email on the server (to check for false positives). Periodically, I’ll have to delete the spam or else the server’s hard disk will fill up with spam. As you may have gathered, you can also use this same interface to check your own email. If you read email from the OpenWebMail interface, it will be marked as “read” when you download it to Eudora (or whatever mail client you use). This is also a good way to access email when you are out of the office and/or when VNC-over-SSH isn’t working.
So far, so good.
See also:
http://www.intuitive.com/spam-assassin-rule-help.html
What if you add a CAPTCHA (http://en.wikipedia.org/wiki/Captcha) on your web-mail form? There is a lively discussion on adapting the initial CAPTCHA form (distorted text) to make them more hack resistant now that there is a body of attack techniques and I believe they still remain a viable option. The Gimpy CAPTCHA uses multiple overlapping words which really confound the bots. hehe.
Greetings Peter,
My concern with CAPTCHA implementations is that they can frustrate legitimate users. So I’ve been trying to focus on solutions that are transparent to users and frustrating to bots, spammers, and the like.
Regards,
Erik