January 15, 2007
Thwarting comment spam
There are a variety of approaches to combatting blog spam. Each has its pros and cons, and none of them is perfect. Three common approaches that I've seen:
1. CAPTCHA's can be effective, but spammers are getting better at finding ways around them. Generally speaking, the better the CAPTCHA is at keeping out spammers, the harder it is for your user to decipher.
2. Bayesian algorithms, keyword filters, and other types of content analysis are good, but spammers are getting smarter. Blog spam is becoming increasingly on-topic, making it harder to filter it out and requiring more manual work.
3. Requiring a login or subscription to a service is a good approach in theory, but in practice spammers have no problem setting up bogus accounts. Then there's the fact that most people don't like creating accounts just to post comments.
Recently I decided to test out a new approach to dealing with comment spam. It occurred to me that most spammers don't necessarily crawl my blog at the same moment they spam it. It seems more likely that they have bots that crawl the web to find comment forms (or even subscribe to services that do this) and then systematically spam it some time later.
So I set up a simple PHP script that ties into Apache's mod_rewrite to manage the URL's for my blog. Basically every day, the URL for the comment form changes. The new URL comtains an effectively random 32 character string, making it impossible to guess the correct URL without actually going to the blog. The end result has been an 80% reduction in comment spam. Combined with MT's junk filters, it has proven to be extremely effective.
The next step will be to adjust the frequency of the URL changes to see how that affects the remaining 20% — if it ends up making a big enough dent, I'll make the source public to give people an additional weapon in the war on spam.