<?xml version="1.0" encoding="utf-8" ?><rss version="2.0" xml:base="https://www.webmaster-forums.net/crss/node/1019341" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title></title>
    <link>https://www.webmaster-forums.net/crss/node/1019341</link>
    <description></description>
    <language>en</language>
          <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/htaccess-ban-list#comment-1115223</link>
    <description> &lt;p&gt;ah, didnt realize you were only trying to block the naughty ones.  one thing though, the naughty ones probably change theyre agent name regularly also.  hell i wouldnt doubt if some of them created a UA name dynamically.  Another method to stop all bots is to only allow known legitimate user-agent strings. Unfortunately, crawlers can send any query string they want, but most identify themselves in the agent.&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Oct 2002 09:27:08 +0000</pubDate>
 <dc:creator>ROB</dc:creator>
 <guid isPermaLink="false">comment 1115223 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/htaccess-ban-list#comment-1115220</link>
    <description> &lt;p&gt;Rob, you&#039;ve got a lot of search engines in your list, I&#039;m just trying to ban the bad ones from leeching email addresses, images and site files.&lt;/p&gt;
&lt;p&gt;but certain ones like ia_archiver I&#039;ll add, it only sucks files and from what I have found through sites has been in legal battles from being aggressive bandwidth theft. &lt;/p&gt;
&lt;p&gt;Problem with using the robots file, most of these wont read or even look at it, let alone obey it. I&#039;ve also set up a harvestor trap to get the extra ones I&#039;ve missed, then can ban them by IP.&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Oct 2002 08:57:09 +0000</pubDate>
 <dc:creator>Busy</dc:creator>
 <guid isPermaLink="false">comment 1115220 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/htaccess-ban-list#comment-1115218</link>
    <description> &lt;p&gt;i should combine mine with yours Busy, looks like you got some i didnt, and vice versa.&lt;/p&gt;
&lt;p&gt;I can&#039;t  use the robots.txt directives in my application, but you certainly can with your webserver.  Create a file called robots.txt in your web document root and include the lines (not positive on this, from memory):&lt;/p&gt;
&lt;p&gt;User-Agent: *&lt;br /&gt;
Disallow: /&lt;/p&gt;
&lt;p&gt;And voila, any crawlers (that respect the robot exclusion directives) will skip your site.&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Oct 2002 08:02:52 +0000</pubDate>
 <dc:creator>ROB</dc:creator>
 <guid isPermaLink="false">comment 1115218 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/htaccess-ban-list#comment-1115217</link>
    <description> &lt;p&gt;heh, how strange.  i spent the last 4 hours putting together a &#039;robot agent keyword&#039; list for some stat tracking sofware im working on.  I seriously just finished (or gave up) and come here to find a thread on the subject, very wierd (are you reading my mind?)&lt;/p&gt;
&lt;p&gt;Anyways, here&#039;s my list.  Note i converted everything to lowercase (because i store useragents in lowercase) and the existence of any of these strings (case-insensitive) in a user-agent should, in theory, indicate a crawler. this is all un-tested also as i just finished this list.&lt;/p&gt;
&lt;p&gt;aitcsrobot&lt;br /&gt;
ao/a-t.idrg&lt;br /&gt;
arachnoidea&lt;br /&gt;
architextspider&lt;br /&gt;
atomz&lt;br /&gt;
auresys&lt;br /&gt;
awapclient&lt;br /&gt;
axis&lt;br /&gt;
backrub&lt;br /&gt;
bayspider&lt;br /&gt;
big brother&lt;br /&gt;
bjaaland&lt;br /&gt;
black widow&lt;br /&gt;
blackwidow&lt;br /&gt;
borg-bot&lt;br /&gt;
bspider&lt;br /&gt;
cactvs chemistry spider&lt;br /&gt;
calif&lt;br /&gt;
cern-linemode&lt;br /&gt;
checkbot&lt;br /&gt;
christcrawler&lt;br /&gt;
cienciaficcion&lt;br /&gt;
combine&lt;br /&gt;
computingsite&lt;br /&gt;
conceptbot&lt;br /&gt;
coolbot&lt;br /&gt;
cosmos&lt;br /&gt;
crawler&lt;br /&gt;
crawlpaper&lt;br /&gt;
cusco&lt;br /&gt;
customcrawl&lt;br /&gt;
cyberpilot&lt;br /&gt;
cyberspyder&lt;br /&gt;
deweb&lt;br /&gt;
diagem&lt;br /&gt;
die blinde kuh&lt;br /&gt;
dienstspider&lt;br /&gt;
digger&lt;br /&gt;
digimarc&lt;br /&gt;
digimarc cgireader&lt;br /&gt;
diibot&lt;br /&gt;
dittospyder&lt;br /&gt;
dlw3robot&lt;br /&gt;
dnabot&lt;br /&gt;
dragonbot&lt;br /&gt;
duppies&lt;br /&gt;
ebiness&lt;br /&gt;
eit-link-verifier&lt;br /&gt;
elfinbot&lt;br /&gt;
emc spider&lt;br /&gt;
esirover&lt;br /&gt;
esismartspider&lt;br /&gt;
esther&lt;br /&gt;
evliya celebi&lt;br /&gt;
explorersearch&lt;br /&gt;
fast-webcrawler&lt;br /&gt;
fastcrawler&lt;br /&gt;
fdse&lt;br /&gt;
felix&lt;br /&gt;
fido&lt;br /&gt;
fish-search&lt;br /&gt;
flipper&lt;br /&gt;
folio&lt;br /&gt;
foobar&lt;br /&gt;
fouineur&lt;br /&gt;
freecrawl&lt;br /&gt;
funnelweb&lt;br /&gt;
g.r.a.b.&lt;br /&gt;
gammaspider&lt;br /&gt;
gazz&lt;br /&gt;
gcreep&lt;br /&gt;
gestalticonoclast&lt;br /&gt;
getterroboplus&lt;br /&gt;
geturl.rexx&lt;br /&gt;
glimpse&lt;br /&gt;
golem&lt;br /&gt;
googlebot&lt;br /&gt;
grabber&lt;br /&gt;
griffon&lt;br /&gt;
gromit&lt;br /&gt;
grub-client&lt;br /&gt;
gulliver&lt;br /&gt;
gulper&lt;br /&gt;
gulper web bot&lt;br /&gt;
harvest&lt;br /&gt;
havindex&lt;br /&gt;
hazel&#039;s ferret web hopper&lt;br /&gt;
hku www robot&lt;br /&gt;
hometown spider pro&lt;br /&gt;
htdig&lt;br /&gt;
htmlgobble&lt;br /&gt;
hämähäkki&lt;br /&gt;
i robot&lt;br /&gt;
ia_archiver&lt;br /&gt;
iagent&lt;br /&gt;
iajabot&lt;br /&gt;
ibm_planetwide&lt;br /&gt;
image.kapsi.net&lt;br /&gt;
imagelock&lt;br /&gt;
imagescape&lt;br /&gt;
incywincy&lt;br /&gt;
industry canada bot&lt;br /&gt;
informant&lt;br /&gt;
infoseek&lt;br /&gt;
infospiders&lt;br /&gt;
ingrid&lt;br /&gt;
inspectorwww&lt;br /&gt;
internet cruiser&lt;br /&gt;
iron33&lt;br /&gt;
ispi&lt;br /&gt;
israelisearch&lt;br /&gt;
javabee&lt;br /&gt;
jbot&lt;br /&gt;
jcrawler&lt;br /&gt;
jeeves&lt;br /&gt;
jobo&lt;br /&gt;
jobot&lt;br /&gt;
joebot&lt;br /&gt;
jubiirobot&lt;br /&gt;
jumpstation&lt;br /&gt;
katipo&lt;br /&gt;
kdd-explorer&lt;br /&gt;
kit-fireball&lt;br /&gt;
kit_fireball&lt;br /&gt;
ko_yappo&lt;br /&gt;
labelgrab&lt;br /&gt;
larbin&lt;br /&gt;
legs&lt;br /&gt;
libertech-rover&lt;br /&gt;
linecker&lt;br /&gt;
linkidator&lt;br /&gt;
linklint&lt;br /&gt;
linkscan&lt;br /&gt;
linkwalker&lt;br /&gt;
lmtaspider&lt;br /&gt;
lmtasspider&lt;br /&gt;
lockon&lt;br /&gt;
logo.gif crawler&lt;br /&gt;
lycos&lt;br /&gt;
lycos_spider&lt;br /&gt;
magpie&lt;br /&gt;
mediafox&lt;br /&gt;
mercator&lt;br /&gt;
merzscope&lt;br /&gt;
mindcrawler&lt;br /&gt;
moget&lt;br /&gt;
momspider&lt;br /&gt;
monster&lt;br /&gt;
motor&lt;br /&gt;
mouse.house&lt;br /&gt;
muscatferret&lt;br /&gt;
mwdsearch&lt;br /&gt;
nec-meshexplorer&lt;br /&gt;
nederland.zoek&lt;br /&gt;
netcarta&lt;br /&gt;
netcarta_webmapper&lt;br /&gt;
netmechanic&lt;br /&gt;
netscape-catalog-robot&lt;br /&gt;
netscoop&lt;br /&gt;
newscan-online&lt;br /&gt;
nhsewalker&lt;br /&gt;
nomad&lt;br /&gt;
northstar&lt;br /&gt;
occam&lt;br /&gt;
open text&lt;br /&gt;
openbot&lt;br /&gt;
openfind&lt;br /&gt;
opilio&lt;br /&gt;
orbsearch&lt;br /&gt;
packrat&lt;br /&gt;
pageboy&lt;br /&gt;
parasite&lt;br /&gt;
patric&lt;br /&gt;
pbwf&lt;br /&gt;
pegasus&lt;br /&gt;
peregrinator&lt;br /&gt;
perlcrawler&lt;br /&gt;
pgp-ka&lt;br /&gt;
phpdig&lt;br /&gt;
piltdownman&lt;br /&gt;
pimptrain&lt;br /&gt;
pioneer&lt;br /&gt;
plumtreewebaccessor&lt;br /&gt;
poppi&lt;br /&gt;
portalbspider&lt;br /&gt;
portaljuice&lt;br /&gt;
psbot&lt;br /&gt;
pybot&lt;br /&gt;
raven&lt;br /&gt;
resume robot&lt;br /&gt;
rhcs&lt;br /&gt;
road runner&lt;br /&gt;
robbie&lt;br /&gt;
robocrawl&lt;br /&gt;
robodude&lt;br /&gt;
robofox&lt;br /&gt;
robot&lt;br /&gt;
robot du crim&lt;br /&gt;
robozilla&lt;br /&gt;
roverbot&lt;br /&gt;
rules&lt;br /&gt;
sabic&lt;br /&gt;
safetynet&lt;br /&gt;
scooter&lt;br /&gt;
searchprocess&lt;br /&gt;
senrigan&lt;br /&gt;
sg-scout&lt;br /&gt;
shagseeker&lt;br /&gt;
shai&#039;hulud&lt;br /&gt;
sharp-info-agent&lt;br /&gt;
sidewinder&lt;br /&gt;
simbot&lt;br /&gt;
site valet&lt;br /&gt;
sitetech&lt;br /&gt;
sitetech-rover&lt;br /&gt;
slcrawler&lt;br /&gt;
sleek&lt;br /&gt;
slurp&lt;br /&gt;
snooper&lt;br /&gt;
solbot&lt;br /&gt;
spanner&lt;br /&gt;
speedy&lt;br /&gt;
spider&lt;br /&gt;
spiderbot&lt;br /&gt;
spiderline&lt;br /&gt;
spiderman&lt;br /&gt;
spiderview&lt;br /&gt;
spyder&lt;br /&gt;
squirrel&lt;br /&gt;
ssearcher&lt;br /&gt;
suke&lt;br /&gt;
suntek&lt;br /&gt;
superewe&lt;br /&gt;
t-rex&lt;br /&gt;
tarantula&lt;br /&gt;
tarspider&lt;br /&gt;
techbot&lt;br /&gt;
templeton&lt;br /&gt;
teoma_agent1&lt;br /&gt;
titan&lt;br /&gt;
titin&lt;br /&gt;
tlspider&lt;br /&gt;
ucsd-crawler&lt;br /&gt;
udmsearch&lt;br /&gt;
url spider pro&lt;br /&gt;
urlck&lt;br /&gt;
valkyrie&lt;br /&gt;
verticrawl&lt;br /&gt;
victoria&lt;br /&gt;
vision-search&lt;br /&gt;
voyager&lt;br /&gt;
vwbot_k&lt;br /&gt;
w3crobot&lt;br /&gt;
w3index&lt;br /&gt;
w3m2&lt;br /&gt;
w3mir&lt;br /&gt;
w@pspider&lt;br /&gt;
wallpaper&lt;br /&gt;
web21&lt;br /&gt;
webbandit&lt;br /&gt;
webcatcher&lt;br /&gt;
webcopy&lt;br /&gt;
webcrawler&lt;br /&gt;
webfetcher&lt;br /&gt;
weblayers&lt;br /&gt;
weblinker&lt;br /&gt;
webmoose&lt;br /&gt;
webquest&lt;br /&gt;
webreader&lt;br /&gt;
webreaper&lt;br /&gt;
webs&lt;br /&gt;
webvac&lt;br /&gt;
webwalk&lt;br /&gt;
webwalker&lt;br /&gt;
webwatch&lt;br /&gt;
webzone&lt;br /&gt;
whatuseek_winona&lt;br /&gt;
wired-digital-newsbot&lt;br /&gt;
wisewire&lt;br /&gt;
wlm-&lt;br /&gt;
wolp&lt;br /&gt;
wwwc&lt;br /&gt;
wwwwanderer&lt;br /&gt;
xget&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Oct 2002 07:55:35 +0000</pubDate>
 <dc:creator>ROB</dc:creator>
 <guid isPermaLink="false">comment 1115217 at https://www.webmaster-forums.net</guid>
  </item>
  </channel>
</rss>
