The Spam Diaries

News and musings about the fight against spam.
 by Edward Falk

Monday, February 06, 2006

Approaches to fighting blog spam

Blog spam (sometimes known as comment spam), for those not familiar with it, consists of comments, pings, and trackbacks added to popular blogs in order to "leach" some of the high page rank from those blogs. The idea being that when the search engines crawl the blog, they'll pick up the links in the comments and those links will have their page rank boosted.

A few months ago, another blog picked up the story of the lawsuit against me. This week, I heard back from that blogger informing me that ironically, the months-old post had attracted spam of its own. A comment was added that consisted of nothing but a link to an affiliate shopping page at worldcinemadvd.com. If you go to the advertised site, there's nothing there but Google ads. (See my companion article about worldcinemadvd.com and cloaking.)

So what techniques are there to deal with this? If worldcinemadvd.com had been a legitimate site, the first step would have been to complain to its administrators about the actions of their affiliate. In this case, there's no point.

We can find out who the service provider to worldcinemadvd.com is (traceroute indicates that it's liquidweb.com) and complain. We can also complain to their registrar (godaddy.com, who also has a good anti-spam policy.) We'll see how this approach works.

Perhaps the simplest approach to preventing blog spam is to implement the requirement that the poster solve a captcha puzzle. Blogger supports this that I know of, and I suspect most blog software does.

You can always set comment moderation to require approval for each comment, but that has a number of disadvantages, not the least of which is the increased workload in maintaining the blog.

There are also automated tools to deal with the problem. Most blogging software has options to control the moderation of comments. One of the features offered by blog moderation software is the ability to blacklist specific domains so that comments coming from those domains or linking to them are automatically deleted. The particular blog in question is powered by Movable Type, which includes a feature called MT-Blacklist which is a collaborative moderation system for movable Type.

Originally developed by Jay Allen as a plugin for movable Type, MT-Blacklist aggregates the individual blacklists created by bloggers. The principal is similar to the communal moderation employed by Slashdot, Craig's List and other online communities -- if enough bloggers flag a domain as being bad, the MT-Blacklist software will add the domain to a global blacklist which is applied globally to all bloggers making use of the service.

MT-Blacklist was enough of a success that movable Type hired Jay Allen and incorporated MT-Blacklist into movable Type version 3. This has had an enormous impact on movable Type, virtually eliminating blog spam.

The approaches to fighting blog spam then, are threefold. First, more automated tools such as MT-Blacklist should be developed. Perhaps SPEWS or one of the other DNSBL providers could be convinced to include databases for known blog spammers, or perhaps a more global version of MT-Blacklist could be developed which could be used by multiple blog engines.

Second, service providers need to be educated that blog spam is as unacceptable as other forms of spam, and to develop policies against it. This will be an uphill battle, as no service provider wants to take actions against a paying customer.

Third, and most importantly, the search engine companies must stop indexing links from within blog comments. They should also adopt a practice of dropping spam sites from their crawl when they are discovered, and terminate advertising partnerships with such sites.

0 Comments:

Post a Comment

<< Home