Help: Incredible Bandwidth Spike

He has: 688 posts

Joined: Feb 2001

Over the past year my site's bandwidth usage averaged around 40megs per month. In the past three days it's been 580megs, 420megs, and 120megs so far today before lunchtime. My site is at 95% limit usage and is about to be shut down before the end of the day. I can increase my own domain's limits but at that pace I still wouldn't make it to the end of the week, so I must find the cause of this tremendous 4000% sudden increase.

I've looked in my cpanel stats and I can't find a cause of the problem. I can't find an particular file that may be hotlinked or just very popular. (Although I did find a recent visitor IP which seemed to request every file I have). Any ideas?
Confused

------------
testing only. ignore this part here.

Busy's picture

He has: 6,151 posts

Joined: May 2001

Use your raw logs, web stats (of any brand) are useless for things like this.
Your log files will be big so start with the error logs, if it's bots, the stupid ones will be easy to spot (they make try find root files in folders), them your log files strip out all your domain name (first using www. then without it) then search for http and/or www and you'll probably find it is bots and forums hotlinking to files.

Block all repeat offenders via htaccess

It's a horrible time consuming job but needs to be done
good luck

He has: 688 posts

Joined: Feb 2001

I found an IP address in my stats that gobbled up over 56,000 page views in just one night. Other than my message board, my entire site only has a few dozen pages total. I searched for that IP address and connected it to some evil search spider named "omni-explorer". Curse them!

Two bits of information that enrage me even more than the fact that they've forced me to shut down most of my website in order to stay afloat until the end of the month... 1) My properly written robots.txt clearly states that no robots are allowed to index my forums (which should have stopped the bot after a few dozen page views) and 2) that despite this over-indulgent attack, their bot is not listed as a spider that visited my website (so their trying to avoid detection (like a bull in a china shop)).

Sorry, but I am so F****** p***** off! Mad I've now banned their IP's from my domain but it's too little too late.

He has: 37 posts

Joined: Feb 2006

Can you share those IPs?

I, for one, would be grateful.

It would be nice to never have that happen to me.

Thanks!

He has: 688 posts

Joined: Feb 2001

My attack came specifically from 64.127.124.157 bit their website states that their IP address ranges are: 64.127.124.* and 65.19.150.129 - 65.19.150.255.

Busy's picture

He has: 6,151 posts

Joined: May 2001

Robots.txt is only for the main search engines as a suggestion, a robots.txt file should never be used solely to discourage bots from folders, files or images. Bad bots hardly ever read the file, some do view it so as to not look bad but usually dont take notice of it.

The site gives the UA but did you check it (and IP) in your logs? if so is it correct.

I know a few people who believe the bot problem is way out of control, every tom, fred and henery are trying to make the next google. They block all bots except the main ones and use a bot trap.
The way it works is a hidden link on all pages, the page adds their IP to the .htaccess page and displays a ugly message for them and can sometimes send them into a wild loop so they have to suck up thousands of useless pages going nowhere fast. - This can suck up a few resources thou so beware.

He has: 688 posts

Joined: Feb 2001

I saw a link on somebody's website a few weeks ago and curiously followed it to http://www.spampoison.com/ . It sounds like the trap you were mentioning, but does it really work (or at least help)? Of course punishing bad bots won't actually protect my website so I'm also intrigued by that hidden-link-auto-ban thing you were talking about. Creating something like that myself is beyond my abilities, but what would something like that be called if I were trying to look for that on hotscripts or someplace?

Thanks.

Busy's picture

He has: 6,151 posts

Joined: May 2001

The bot traps do help, say a run away bot sucks everything off your site, you wont know about it until the next day, with a bot script the bot is banned straight away (depends how it set up) and if sent away (apposed to given 403 pages) it will only get a few pages.

I didn't give any links or script as there are so any ways of doing it.
The very basic just blocks if bot views that one page, ideally called forbotsonly, or keepout, or banlist... but people are nosey and you will catch some so the trick is to put the file in the robots.txt only, this way only bots (or nosey people viewing your robots file) will ever find it. Isn't foolproof but does catch a lot.
There are some advance scripts out there that only ban for 24-48 hours, banned for viewing pages to quick (tabbed browsing can trigger this).

here are a couple of links to help you decide which way you want to go.
http://www.neilgunton.com/spambot_trap/
http://www.kloth.net/internet/bottrap.php

http://www.g-clef.net/drupal/?q=node/3 <-- what a bot trap is
http://www.jkcc.com/e-mail.html <-- what a loop page or punishment page could look like

The second link is probably the best one, but if you just want some code

<?php
$filename
= $_SERVER[\"DOCUMENT_ROOT\"] . \"/.htaccess\";
$content = \"SetEnvIf Remote_Addr ^\" .
str_replace(\".\",\"\.\",
$_SERVER[\"REMOTE_ADDR\"]).\"$ getout\r\n\";
$handle = fopen($filename, 'r');
$content .= fread($handle,filesize($filename));
fclose(
$handle);
$handle = fopen($filename, 'w+');
fwrite(
$handle, $content,strlen($content));
fclose(
$handle);
// change youremail@yourdomain and <a href="
mailto:trap@yourdomain.com" class="bb-email">[email protected]</a> to your real
// address and real domain name, leave 'trap@' so you know it's from the spider trap
mail(\"youremail@yourdomain\",
\"Spider Alert!\",
\"The following ip just got banned because it accessed the spider trap.\r\n\r\n\" .
$_SERVER[\"REMOTE_ADDR\"] . \"\r\n\" . $_SERVER[\"HTTP_USER_AGENT\"] . \"\r\n\" .
$_SERVER[\"HTTP_REFERER\"] ,\"FROM: <a href="mailto:trap@yourdomain.com" class="bb-email">[email protected]</a>\");
$page = '';
// note: some site downloaders will also trigger the script
$page .= \"<h1>You have been permantly blocked from the site</h1>\";
$page .= '<p>We don\'t allow site downloads or email spiders ' .
'of any kind, sorry. If you feel this is a mistake, ' .
' please send us an email with your IP address and we\'ll ' .
'remove your IP address from the blocked list.</p>';
/*
add email constructor here if desired (post 4 in thread)
*/
echo
$page;

?>
page is getout.php

or do a search for things like bot trap, auto bot trap, bad bots ...

Busy's picture

He has: 6,151 posts

Joined: May 2001

Forgot to mention these bot traps were orginally created for email harvertors then advanced to bots that download your entire site, it wont stop leeching, or hotlinking (if forums link to one image can have similar effect as bad bot

He has: 688 posts

Joined: Feb 2001

Thanks a ton for all your help! I'll try to get one of those options implemented.

Busy's picture

He has: 6,151 posts

Joined: May 2001

Sometimes it might need a bit of fine tuning (may catch good bot) so just keep an eye on your htaccess file (if you use that method) for a SE cycle.
Another way, which requires work is to use the getout file above but instead of automatically writing to htaccess file get it to just send you an email with the details so you can check them out, and add manually if bad

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.