Search

I haven’t posted in a while. Right now I actually have less time than I’ve had in a long time to post, but for some reason, I have the strong urge to post the later I’m staying up and the more I want to sleep and the more things i have to do. Go figure.

Anyway, I’ve been meaning to post this for a while. I’ve found two nearly fool proof ways to block comment spam from a website and I thought I’d outline them both briefly (and hope desperately that writing about them doesn’t jinx them and resume the flood of hundreds of comments every few days that has totally stopped since I implemented them)

The first is the simplest and easiest to implement. It’s what I’m using on jordanandjaime.com. I don’t let search engines find it. Since I don’t really care if google indexes j&j, the easiest thing to prevent comment spam is to just forbid all search engines from indexing the site. If the search engines don’t know about the page, the comment spammers don’t usually either. There are certainly exceptions, but the site has had incredibly few comment spams posted since I put it up. For more info on doing this yourself, see either my robots.txt file (just put it in the / directory of your site), or google robots.txt for more info.

The second requires a little bit more skill. It’s unfortunately one of those defenses that only work as long as everybody doesn’t implement the same thing, but I hardly doubt a slew of bloggers are going to suddenly start reading this and emulating me. ;-) Plus, I’m sure others have long since had this idea and implemented it elsewhere. If you look at the source code of wantingseed now on a page where you can comment, you’ll notice a bit of javascript like:

<script type='text/javascript'>
first="<for";
second="m metho";
third="d="pos";
fourth="t" actio";
fifth="n="http://wantingseed.com/movabletype/mtc.cgi" name="com";
sixth="ments_form" onsub";
seventh="mit="if (this.bakecoo";
eighth="kie[0].checked) reme”;
ninth=”mberMe(this)”>”;
document.write(first+second+third+fourth+fifth+sixth+seventh+eighth+ninth);
</script>

In short, it’s just some javascript that writes out the url of where the comments get submitted to in the browser.. There’s good news and bad news about this particular hack.

The good news is that it’s possible to make this infinitely convoluted, but totally automatic. Javascript is a reasonably powerful language (thanks AJAX for proving that), and you can represent the above code in a literally infinite amount of ways that make writing a bot to crack it infinitely hard.

The bad news is that all it takes is one smart human to crack it once (and in this case, “crack” is a bit too strong of a verb — look at it cross-eyed and it about breaks), and feed in the appropriate URL into their comment bots, and away they go again. Hiding a url is ultimately not a very good way to do it. The next version, therefore should logically use some entirely AJAX methods to upload comments directly via xmlhttp. Comment spam bots would have to be much more complex than they already are. I’m sure they will be eventually, and by then it will be time to move on to yet another better, brighter idea.

2 Responses to “Killing Comment Spam Dead”

    I’ve never gotten comment spam on my page, despite being indexed by all the search engines. I think this is because I don’t use a popular blog engine like movable type. I’m not sure your method will actually work, since many of the comment spam bots just use the web API for the blog software. I can send a POST to your web server formatted correctly without ever visiting your page.

    Sure, but you could, but since I’ve renamed the form that it uses to submit the comment, you’d have to find out that name, and that’s why you have to either decode the javascript or sniff the network, or do some more intensive investigation than the comment spammers usually put into it. I originally just renamed the file that comments are submitted to, but the spammers were quick enough to follow that change. Fortunately, the addition of the javascript obfuscation works well.

    Of course, your point that it’s the popular blog engines being targetted is very true — the only reason my mechanism works is because it’s NOT a default protection mechanism of any of the major engines. As soon as it became one, it would be automated around.