Hand Fighting Referrer Spammers

The WikiPedia defines Referer spam as “a kind of search engine-targeted spam. The technique involves making repeated web site requests using a fake referer url pointing to a spam-advertised site. Sites that publicize their referer statistics will then also link to the spammer’s site. This benefits the spammer because of the free link, and also gives the spammer’s site improved search engine link placement due to link-counting algorithms that search engines use.

Tell me about it.

While currently working feverishly to release the new and improved version of blogs4God, I look at the referrer logs only to see literally thousands of incoming references from ‘poker-this’ or ‘hair-loss’ that. Basically they are stealing my bandwidth so they can then later steal your money. No thank you.

There are several techniques, for example in my March’03 post entitled “How to block spambots, ban spybots, and tell unwanted robots to go to …” I reference an article by fellow Apex resident Mark Pilgrim on how to block such visits using the .htaccess file. An excellent approach – but I’m greedy and want to block such nere-do-wells at an even lower level. Like say at my APF firewall so all the sites on this server are protected – as well as my bandwidth.

Currently, I’m experimenting with different techniques but I think by next week I’ll have a ‘bash shell script‘ that will include the following ‘hand’ techniques I’m using to explore a solution.

grep “poker” /usr/var/logs/access_log | perl -l -a -n -e ‘print $F[0]’ >> poker.txt
sort -u -n poker.txt >> /etc/apf/deny_hosts.rules
/etc/init.d/apf restart

Pretty simple – and pretty easy to run by hand for now … at least until I’m satisfied that I’ve worked out all the kinks, created an input file of bad-guy terms to drive the grep and insure (notify me) the firewall is correctly and safely restarted.

We are after-all talking about a firewall that has the potential to lock me out OR let everyone else in if I’m not careful.