Banning Bad Bots Using The global.asa File In Classic ASP
Bad bots can cause problems for your website. They can submit spam to your forum or blog, spam your contact form, or just use up your valuable resources such as bandwidth and CPU. If you use Classic ASP this article will show you how to ban bad bots from your entire website using the global.asa file. WSI was recently asked by one of our longstanding clients to investigate why their website had started using a much greater amount of bandwidth than expected. Our client was already utilising our web analytics services, so our first port of call was their Google Analytics account so that we could investigate their web traffic.
Identify the traffic source
We soon identified that large numbers of page views were being generated from a Google AdWords campaign that was no longer active. Naturally this aroused our suspicion, because an inactive pay per click (PPC) campaign should not be generating any traffic at all! Our next course of action was to analyse the website’s server logs in order to further identify the source of the traffic. We quickly isolated the page views as being generated by a bot. A bot is a software application that runs automated tasks over the Internet. The largest use of bots is in web spidering by search engines and the like.
The bad bot
Banning the bad bot
Our client’s website is built in classic ASP and hosted on a Microsoft IIS/6.0 machine, so that dictated the methods that we could use to ban the nuisance bot.
We assumed from the start that the bot would not obey any exclusion set up via the robots.txt file, so we ignored that option completely.
ASP code at page level
It would be possible to insert ASP code into individual pages in order to stop the pages loading if they were requested by the bot in question. However, this would mean editing multiple existing pages and then monitoring the website to check whether further pages started being affected by the bot. There are more suitable ways of achieving our goal using the IIS/6.0 web server itself.
Identify the bad bot
There are two main ways in which a bot can be identified: by its IP address and its user-agent. (A user-agent is a text string that identifies the client application making the request.) Normally it’s considered best practice to identify and ban a bot based on its IP address rather than its user-agent, because a user-agent can be easily spoofed. Bad bots often originate from one or a small number of IPs or networks and identifying these is usually the preferred method of blocking. However, in this case our client’s website logs showed us that the bot came from a wide range of IP addresses on different networks, but as far as we could see it did always identify itself correctly with a legitimate user-agent string. We therefore decided that the best way to combat this particular bot would be to ban it based on its user-agent.
Code to ban the bad bot using global.asa
Banning the bad bot using the global.asa file is fairly straightforward. The global.asa file has 4 events: Application_OnStart,Session_OnStart, Session_OnEnd and Application_OnEnd. By adding bot blocking code to the Session_OnStart event it will be executed whenever a user (including the bot) starts a session. We used the following code:
If request.ServerVariables("HTTP_USER_AGENT") = "insert bad bot user-agent here" Then Session.Abandon Response.End() End If
Initial indications show that the bot ban has worked very well, with our client’s bandwidth coming back down to normal levels. The bot banning code could be further refined and extended:
- We could look for specific keywords or phrases within the user-agent string rather than checking the whole string
- We could send a “403 Forbidden” HTTP response rather than “200 OK”, which would be the technically correct thing to do when banning a visitor
- We could have a list of bad bots that should be banned, rather than just identifying one bot
- We could ban bots based on both their user-agents and IP addresses
Perhaps we’ll investigate these possibilities in another post!