Hiding Email Addresses From Robots with Javascript

| | 1 Comment | 1 TrackBack

Spammers use automated programs to search the internet for email addresses to harvest. There are several techniques out there to hide email addresses from spam “robots”: Address Munging. Although, all these techniques can be beaten—some rather easily—you probably still want to use them because the average harvester is not designed to handle them. Sites with exposed email addresses outnumber those that protect theirs, and as long as there are easy nuts, the spammers are not going to invest the resources and time to crack the hard nuts. (Think Darwin’s Finches.)

The simplest technique is to add characters to the email address that most humans will recognize and remove or replace to form the valid address:

jason [at] theargonauts [dot] com
NOSPAMjason@theargonauts.com

This is a very effective technique; however, it requires user intervention, and the most popular techniques can still be interpreted with little effort by a robot. If user intervention is not a concern, then one nearly flawless technique is to simply display the email in an image. In my opinion this should only be used as a last resort because it has several drawbacks: no click to email ability, difficulty in copying the email address, increased download size, etc.

Transparent munging techniques can be used to allow users to click and use email address directly. The simplest technique encodes the email address using html entities:

jason@theargonauts.com

Obviously, this is not that hard to crack because everyone knows the entity-to-character translation rules ahead of time. However, we can use javascript to make things difficult to understand but easy to use. The logic here is to encrypt email addresses in webpages and use javascript to force browsers to decrypt them. The advantage here is that to all humans visiting your site, the email address behaves like a normal mailto: link. However, robots won’t understand it because they don’t implement javascript. Using a simple rot13 cypher would produces the following address:

wnfba@gurnetbanhgf.pbz

From this, javascript can be used to tell the browser that the above email address is really jason@theargonauts.com. Although generic harvesters will probably never implement the complexities of javascript, they can be programed to decode the email addresses without javascript once they learn the algorithms. However, as long as each site implements its own encode and decode algorithms, it is uneconomical for robots to specialize and hard to be comprehensive.

I use the following javascript framework to decode email addresses on my blogs:

  1. Find all links with a mailto: specification and for each:
  2. Match an encoded address or return
  3. Decode the address
  4. Split the decoded string into the real address and an optional url text
  5. Set the link location to the real email address.
  6. Set the inner html text if it was specified.

Using the tricks of the jQuery library and the assumption that email addresses are encoded as hexadecimal numbers (characters 0-9 and a-f), we have the following frame work. You will just need to supply your own decryption routine.

$(function(){
	$('a[href^=mailto:]').each(function(i) {
		var xr = this.href.match(/^mailto:([0-9a-f]+)$/);
		if(!xr) return;
		var x = xr[1]; // encoded address
		var s = ''; // decoded address
		/*
		Here be decoding of 'x' into 's'.
		*/
		xr = s.split(/\|/);  // optionally split real address from url text
		this.href = "mailto:" + xr[0];
		if(xr.length > 1) $(this).html(xr[1]);
	});
});

Here is an example of an encoded email address used on this page that is decoded by the script above:

  <a href="mailto:7d0f6a0f6b2b4f2a583d4f3a5739582c592b4a6411625d0e7b19731675013c781d3d6f0a780d60400e6f1b6e1c7d5d1f731c7b">Email Me</a>

Very cryptic and yet the page still validates—unlike other javascript email hiding implementations—while the user hardly ever notices that anything is up. Of course, if you decide to use this technique you will need a way to encode your email addresses in the first place. I wrote movabletype plugin to produce my encoded email addresses. Another option would be to setup a form on your website that would process addresses and return the proper encoding.

1 TrackBack

Blogotechnospherics from Greg Laden's Blog on March 2, 2008 8:29 PM

Reed Cartwright has two neat current posts that may be of interest to the blog nerd. One is on how to render equations with Imagemagick. This technique uses LaTeX to start with and ends up with a picture you can put on a blog post.... Read More

1 Comment

Can you show working sample?

Thank you, Dmitry

About this Entry

This page contains a single entry by Reed A. Cartwright published on February 25, 2008 5:07 PM.

Green Openlab was the previous entry in this blog.

More Lolcats is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Archives

Powered by Movable Type 4.37