Preventing email harvesting

April 10, 2004

Preventing email harvesting

After reading about Web Accessibility over at A List Apart, I started thinking about how great CSS is and what it makes possible.

And I got an idea on how to prevent email harvesting on your website(s). I'm sure almost everyone has learned firsthand what happens when your email address is on the www. And while there are many existing solutions designed to keep spammers from getting your address, they all seem insufficient in one way or another.

To name a few:

  • Use JavaScript to dynamically insert the email
    Bad for two reasons—it requires that your users have JavaScript enabled, and unless you are obfuscating the email in the source, most harvesters will still grab it.
  • Use an image instead
    Bad for three reasons—you must first create an image, the address cannot be copied and pasted into a mail client, and people using alternative browsers (screen reader, lynxs, many pda's) will not see it.
  • Use spaces and words to obfuscate the email (e.g. user AT domain DOT com)
    Bad for two reasons—it cannot be directly copied and pasted, and many harvesters are smart enough to recognize that as an email address.
  • Use the CSS 3 CONTENT property to replace an image of the address with text (see CSS3 Spamkiller)
    While it is future compatible, it has three problems—you must create an image and put the email address in your stylesheet, the address cannot be copied and pasted into a mail client unless you are using Opera (or a CSS3 browser), and people using alternative browsers (screen reader, lynxs, many pda's) will not see it unless their browser supports CSS3.

My idea is very simple. Suppose you want to protect the email address jon@jenseng.com. You simply write the email as jon<span class='obfuscate'> bounce</span>@jenseng.com and add .obfuscate{display:none;} to your stylesheet.

This will work in all CSS browsers, including most screen readers, as they will respect display:none. The end result is a visible, selectable text email that automated email harvesters will miss. If the harvester gets anything, it would be bounce@jenseng.com, since the whitespace separates jon from the bogus address.

This is by no means an end-all solution, as harvesters could be trained to look for this particular scenario if it enters widespread use. But using random text or placing the span in the domain would be enough to be an effective deterrent. If everyone had their own unique version of this fix, it would not be worthwhile for spammers to check for all possibilities.

The unknown is Jaws, the most popular of the screen readers. According to this article, Jaws will render elements that have the property display:none or visibility:hidden. But if you read the Jaws FAQ, it says they support display:hidden and visibility:none and do not render such elements. Yes, you read that correctly. It seems they've got them mixed up. I've sent them an email to see if that is really the case or if it is merely a typo. If it is true, it makes for an interesting dilemna—how does one hide stuff from Jaws without screwing over everyone else? Maybe there's a Jaws CSS filter out there... Update 2004-07-16: Jaws has since fixed its documentation. It does in fact correctly apply these properties.

Another issue is if the browser doesn't support CSS or has it turned off. One workaround would be to use some JavaScript to dynamically delete all elements of class obfuscate. But if they don't have JavaScript...

Posted by jon at April 10, 2004 8:59 PM

Comments

Interesting technique - thanks for sharing this, Jon!

Someone else was also thinking along the same lines about six months earlier, although they didn't explain it as elegantly and fully as you did:

http://archivist.incutio.com/viewlist/css-discuss/30768

For an overview of some other ways to help prevent spammers from harvesting email addresses, including a discussion of the pros and cons of using JavaScript-based methods, please see:

http://istpub.berkeley.edu:4201/bcc/Winter2003/feat.spamharvest.html

Posted by: Aron Roberts at June 28, 2006 11:33 AM

@Aron

Thanks for the commentary and the links. Turns out my technique isn't quite as bulletproof as I initially thought. While it does well against email harvesters, one drawback in many browsers is that when you copy/paste the address, it includes the hidden text as well. Perhaps a workaround would be to delete the hidden span on page load using javascript.

Posted by: jon at July 3, 2006 7:21 PM

Yeah - that is an issue, but use an img tag instead of span and it works like a charm.

Posted by: Jacques at September 12, 2009 9:04 AM
Post a comment









Remember personal info?