<?xml version="1.0" encoding="utf-8" ?><rss version="2.0" xml:base="https://www.webmaster-forums.net/crss/node/1000665" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title></title>
    <link>https://www.webmaster-forums.net/crss/node/1000665</link>
    <description></description>
    <language>en</language>
          <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002871</link>
    <description> &lt;p&gt;yes and no.&lt;br /&gt;
 you would have to change save_url to recognize ftp-urls and donwload files via ftp.&lt;/p&gt;
&lt;p&gt;but i don&#039;t recommend this approach since ftp is totally different from http. (what netscape does is only an emulation, it really uses ftp.)&lt;/p&gt;
&lt;p&gt;you could write ftp-links to an external file and parse it via another script.&lt;/p&gt;
&lt;p&gt;at least that&#039;s what I would do.&lt;/p&gt;
&lt;p&gt;ciao&lt;br /&gt;
Anti&lt;br /&gt;
ps:&lt;br /&gt;
how about &amp;quot;perldoc ftp&amp;quot; ??&lt;/p&gt;
 </description>
     <pubDate>Fri, 11 Jun 1999 00:09:00 +0000</pubDate>
 <dc:creator>anti</dc:creator>
 <guid isPermaLink="false">comment 1002871 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002870</link>
    <description> &lt;p&gt;Will this spider also work for FTP also? Because I am using some of the script developing a program that spiders through the links, and looks for ..mainly all the links that are excluded in the spider you wrote heh.&lt;/p&gt;
&lt;p&gt;If i was to search through both http and ftp, could i do somthing like&lt;br /&gt;
$ua-&amp;gt;proxy(&#039;http&#039;,&#039;ftp&#039;,&#039;http://proxy:80/&#039;, &#039;ftp://proxy:21&#039;);&lt;br /&gt;
?&lt;/p&gt;
 </description>
     <pubDate>Wed, 09 Jun 1999 21:33:00 +0000</pubDate>
 <dc:creator>Dass</dc:creator>
 <guid isPermaLink="false">comment 1002870 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002869</link>
    <description> &lt;p&gt;hi,&lt;/p&gt;
&lt;p&gt;I must admit the script is kind of ... poor coded, but it was only intended to show some basics. it was never meant to be used out of the box.&lt;/p&gt;
&lt;p&gt;but i&#039;ll try to help you:&lt;/p&gt;
&lt;p&gt;1.&lt;br /&gt;
if you call get_all with an empty string it reads links.dat (see load_links). links.dat contains URL;REFERER pairs.&lt;br /&gt;
(some sites/scripts don&#039;t give you the file if the referer is wrong.)&lt;/p&gt;
&lt;p&gt;2.&lt;/p&gt;
&lt;blockquote class=&quot;bb-quote-body&quot;&gt;&lt;p&gt;Quote:&lt;br /&gt;
Hi, it&#039;s me XXX (didn&#039;t know if he would like to be quoted) from the webmaster-forums &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Thanks so much for the help so far! If you have time to answer these&lt;br /&gt;
questions, that would be great!&lt;/p&gt;
&lt;p&gt;get_all(&amp;quot;http://62.144.158.186/&amp;quot; ) ;&lt;br /&gt;
#get_all(&amp;quot;&amp;quot; ) ;&lt;/p&gt;
&lt;p&gt;sub get_all&lt;br /&gt;
{&lt;br /&gt;
my($starturl)=@_;&lt;/p&gt;
&lt;p&gt;What is that used for?? Is it the starting URL to search for links?&lt;/p&gt;
&lt;p&gt;And finally, how does the spider actually find these links, and keep track&lt;br /&gt;
of which is has been to, which it hasn&#039;t etc??
&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;a.) yes, it&#039;s the starting url, if you specify it it&#039;s used &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/wink.png&quot; title=&quot;Wink&quot; alt=&quot;Wink&quot; class=&quot;smiley-content&quot; /&gt;. if not the links.dat is read. right now links.dat is redone from the start &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/sad.png&quot; title=&quot;Sad&quot; alt=&quot;Sad&quot; class=&quot;smiley-content&quot; /&gt; . maybe we could add just another file which remembers how many lines of links.dat were done.&lt;/p&gt;
&lt;p&gt;b.) save_links parses the loaded page (page.dat) for &amp;quot;a href&amp;quot; and &amp;quot;img src&amp;quot;-tags and builds absolute URIs (it sometimes fails, but don&#039;t ask me why ?!).&lt;/p&gt;
&lt;p&gt;3.&lt;/p&gt;
&lt;blockquote class=&quot;bb-quote-body&quot;&gt;&lt;p&gt;Quote:&lt;br /&gt;
Hi, I am trying to get your &amp;quot;spider&amp;quot; script working on my server, I am quite&lt;br /&gt;
experienced in configuring, and debugging scripts, but I am not much of a&lt;br /&gt;
&amp;quot;writer&amp;quot; &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I am having trouble in getting your script to function... How is links.dat&lt;br /&gt;
set up? And What modifications should I make to get the script working on&lt;br /&gt;
my server...&lt;/p&gt;
&lt;p&gt;I want to be able to have a set of admin defined urls spidered, but Im not&lt;br /&gt;
sure where the links should be given to the script....&lt;/p&gt;
&lt;p&gt;Any help would be greatly appreciated!&lt;br /&gt;
[/quote}]&lt;/p&gt;
&lt;p&gt;a.)&lt;br /&gt;
if you call get_all with an URL the links.dat-file is created and &amp;quot;URL&amp;quot; is parsed for links (file: doesn&#039;t work, yet).&lt;br /&gt;
b.)&lt;br /&gt;
you should at change the ua-&amp;gt;proxy setting and mybe the timeout.&lt;br /&gt;
depending on what you want to be ignored you should change the exclude_files-regex (btw: there should be pipes &amp;quot;&amp;brvbar;&amp;quot; between exe,zip,...).&lt;/p&gt;
&lt;p&gt;c.)&lt;br /&gt;
you can simply create the links.dat with you favourite editor or via&lt;br /&gt;
echo xxxx &amp;gt;links.dat&lt;br /&gt;
echo xxxx &amp;gt;&amp;gt;links.dat&lt;br /&gt;
and call get_all with an empty string.&lt;/p&gt;
&lt;p&gt;so that&#039;s it for now.&lt;/p&gt;
&lt;p&gt;three things left:&lt;br /&gt;
1) please ask in the forum, the answer will be faster.&lt;br /&gt;
2) this script is a &amp;quot;HACK&amp;quot; and not more. it&#039;s in no way my usual style and i don&#039;t give any gurantees.&lt;br /&gt;
3) this small input-box sucks.&lt;/p&gt;
&lt;p&gt;ciao&lt;br /&gt;
Anti&lt;/p&gt;
&lt;p&gt;----------&lt;br /&gt;
ps:watch my work in progress at&lt;br /&gt;
&lt;a href=&quot;http://webhome.nu/&quot; class=&quot;bb-url&quot;&gt;http://webhome.nu/&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;
 </description>
     <pubDate>Mon, 07 Jun 1999 18:34:00 +0000</pubDate>
 <dc:creator>anti</dc:creator>
 <guid isPermaLink="false">comment 1002869 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002868</link>
    <description> &lt;p&gt;Im having trouble configuring this script...&lt;/p&gt;
&lt;p&gt;what variables have to be set up? Where should the links.dat &amp;amp; other files be found?&lt;/p&gt;
&lt;p&gt;Any help would be really apreciated!&lt;/p&gt;
&lt;p&gt;Greg.&lt;/p&gt;
&lt;p&gt;----------&lt;br /&gt;
Check out my site...&lt;br /&gt;
&lt;a href=&quot;http://mp34real.cjb.net&quot; title=&quot;http://mp34real.cjb.net&quot;&gt;http://mp34real.cjb.net&lt;/a&gt;&lt;/p&gt;
 </description>
     <pubDate>Sun, 06 Jun 1999 04:53:00 +0000</pubDate>
 <dc:creator>kasper</dc:creator>
 <guid isPermaLink="false">comment 1002868 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002867</link>
    <description> &lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;the easiest way to get help for a perl-module usually is to type:&lt;br /&gt;
perldoc &amp;lt;module&amp;gt;&lt;/p&gt;
&lt;p&gt;perldoc LWP::UserAgent&lt;br /&gt;
shows you the functions/methods and some examples. what do you need more.&lt;/p&gt;
&lt;p&gt;For perl-info in general try cpan.org&lt;/p&gt;
&lt;p&gt;ciao&lt;br /&gt;
Anti&lt;/p&gt;
 </description>
     <pubDate>Wed, 02 Jun 1999 23:16:00 +0000</pubDate>
 <dc:creator>anti</dc:creator>
 <guid isPermaLink="false">comment 1002867 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002865</link>
    <description> &lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;1. sorry for the &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/wink.png&quot; title=&quot;Wink&quot; alt=&quot;Wink&quot; class=&quot;smiley-content&quot; /&gt; &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt; &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/sad.png&quot; title=&quot;Sad&quot; alt=&quot;Sad&quot; class=&quot;smiley-content&quot; /&gt; I should have double-checked ... : )&lt;br /&gt;
2. this script will in fact spider all that is linked (except for the $exclude_files matches)&lt;br /&gt;
between&lt;br /&gt;
save_url($url,$referer,&amp;quot;page.dat&amp;quot; ) ;&lt;br /&gt;
and   save_links(&amp;quot;page.dat&amp;quot;,$url,&amp;quot;links.dat&amp;quot; ) ;&lt;/p&gt;
&lt;p&gt;you could add a routine that checks page.dat for your search-string.&lt;/p&gt;
&lt;p&gt;after save_links you can add some script that copies the page.dat to save the file (if it was your target-file).&lt;/p&gt;
&lt;p&gt;I would like to finish the script as you need it, but I have not much time right now.&lt;/p&gt;
&lt;p&gt;Be free to ask.&lt;/p&gt;
&lt;p&gt;ciao&lt;br /&gt;
Anti&lt;/p&gt;
&lt;p&gt;----------&lt;br /&gt;
ps:watch my work in progress at&lt;br /&gt;
&lt;a href=&quot;http://webhome.nu/&quot; class=&quot;bb-url&quot;&gt;http://webhome.nu/&lt;/a&gt;&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Jun 1999 23:45:00 +0000</pubDate>
 <dc:creator>anti</dc:creator>
 <guid isPermaLink="false">comment 1002865 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002866</link>
    <description> &lt;p&gt;Thanks alot for the help!&lt;/p&gt;
&lt;p&gt;I am wondering if anyone has any links or can introduce me to&lt;br /&gt;
LWP::UserAgent&lt;br /&gt;
HTTP::Request&lt;br /&gt;
HTTP::Response&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Jun 1999 19:06:00 +0000</pubDate>
 <dc:creator>Dass</dc:creator>
 <guid isPermaLink="false">comment 1002866 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002864</link>
    <description> &lt;p&gt;So when completed, this will go through all the directories of the site looking for all the .exe.zipetc ??&lt;/p&gt;
&lt;p&gt;Will it also go through links to other sites, and do the same there?&lt;/p&gt;
 </description>
     <pubDate>Tue, 01 Jun 1999 00:01:00 +0000</pubDate>
 <dc:creator>Dass</dc:creator>
 <guid isPermaLink="false">comment 1002864 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002863</link>
    <description> &lt;p&gt;did i ever mention, i hate these smileys &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
 </description>
     <pubDate>Mon, 31 May 1999 23:11:00 +0000</pubDate>
 <dc:creator />
 <guid isPermaLink="false">comment 1002863 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/server-side-scripting/spidering-through-directories-me-again#comment-1002862</link>
    <description> &lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;using the modules:&lt;br /&gt;
LWP::UserAgent&lt;br /&gt;
HTTP::Request&lt;br /&gt;
HTTP::Response&lt;/p&gt;
&lt;p&gt;you can easily retrive a document from a remote web-site.&lt;br /&gt;
parsing this document for A-tags in perl is very easy.&lt;br /&gt;
The recursing can get a little tricky, but this way you can get at least all documents that are linked from a starting point.&lt;/p&gt;
&lt;p&gt;Maybe this will help you:&lt;br /&gt;
----------&lt;br /&gt;
#!/bin/perl&lt;/p&gt;
&lt;p&gt;require LWP::UserAgent;&lt;br /&gt;
require HTTP::Request;&lt;br /&gt;
require HTTP::Response;&lt;/p&gt;
&lt;p&gt;use URI::URL ();&lt;br /&gt;
#use strict;&lt;/p&gt;
&lt;p&gt;$ua = new LWP::UserAgent;&lt;br /&gt;
$ua-&amp;gt;agent(&amp;quot;Mozilla/4.0&amp;quot; ) ;&lt;br /&gt;
$ua-&amp;gt;timeout(10);&lt;br /&gt;
$ua-&amp;gt;proxy(&#039;http&#039;,&#039;http://proxy:80/&#039;);&lt;/p&gt;
&lt;p&gt;$exclude_files = &amp;quot;.+\.(exeziptgzgzpdftararj)&amp;quot;;# all urls matching thie regex will be ignored&lt;/p&gt;
&lt;p&gt;get_all(&amp;quot;http://62.144.158.186/&amp;quot; ) ;&lt;br /&gt;
#get_all(&amp;quot;&amp;quot; ) ;&lt;/p&gt;
&lt;p&gt;sub get_all&lt;br /&gt;
{&lt;br /&gt;
my($starturl)=@_;&lt;/p&gt;
&lt;p&gt;$urls_next=1;# next one to do&lt;br /&gt;
$urls_last=0;# last used&lt;/p&gt;
&lt;p&gt;if ($starturl eq &amp;quot;&amp;quot; )# we want to continue&lt;br /&gt;
{&lt;br /&gt;
load_links(&amp;quot;links.dat&amp;quot; );&lt;br /&gt;
# FIXME: how do we know where we stopped ???&lt;br /&gt;
#$urls_next = ???&lt;br /&gt;
}&lt;br /&gt;
else# we want to start a new run&lt;br /&gt;
{&lt;br /&gt;
$referer=&amp;quot;&amp;quot;;&lt;/p&gt;
&lt;p&gt;open (LINKS,&amp;quot;&amp;gt;links.dat&amp;quot; );&lt;br /&gt;
print LINKS &amp;quot;$starturl;;\n&amp;quot;;&lt;br /&gt;
close (LINKS);&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;# FIXME:load_links should return if we are done.&lt;br /&gt;
#while (eof&amp;lt;LINKSL&amp;gt; )&lt;br /&gt;
while (load_links(&amp;quot;links.dat&amp;quot; ))&lt;br /&gt;
{&lt;br /&gt;
#load_links(&amp;quot;links.dat&amp;quot; );&lt;br /&gt;
do&lt;br /&gt;
{&lt;br /&gt;
$url = $urls[$urls_next];&lt;br /&gt;
$referer = $refs[$urls_next];&lt;br /&gt;
$urls_next ++;&lt;br /&gt;
print &amp;quot;---&amp;gt; $url\n&amp;quot;;&lt;br /&gt;
save_url($url,$referer,&amp;quot;page.dat&amp;quot; );&lt;br /&gt;
save_links(&amp;quot;page.dat&amp;quot;,$url,&amp;quot;links.dat&amp;quot; );&lt;br /&gt;
sleep 2;&lt;br /&gt;
}&lt;br /&gt;
while ($urls_last &amp;gt;= $urls_next);&lt;br /&gt;
}&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;sub load_links&lt;br /&gt;
{&lt;br /&gt;
my($file)=@_;&lt;/p&gt;
&lt;p&gt;my($old_last);&lt;/p&gt;
&lt;p&gt;$old_last = $urls_last;&lt;/p&gt;
&lt;p&gt;open (LINKSL,&amp;quot;&amp;lt;$file&amp;quot; )die(&amp;quot;hmm\n&amp;quot; );&lt;br /&gt;
while (&amp;lt;LINKSL&amp;gt; )&lt;br /&gt;
{&lt;br /&gt;
chomp;&lt;br /&gt;
@parms = split /;/,$_;&lt;br /&gt;
$isnew=1;&lt;br /&gt;
foreach $url (@urls)&lt;br /&gt;
{&lt;br /&gt;
if ($url eq $parms[0])&lt;br /&gt;
{&lt;br /&gt;
$isnew=0;&lt;br /&gt;
}&lt;br /&gt;
}&lt;br /&gt;
if ($isnew)&lt;br /&gt;
{&lt;br /&gt;
$urls_last ++;&lt;br /&gt;
$urls[$urls_last]=$parms[0];&lt;br /&gt;
$refs[$urls_last]=$parms[1];&lt;br /&gt;
}&lt;br /&gt;
}&lt;br /&gt;
close (LINKSL);&lt;br /&gt;
if ($old_last == $urls_last){ return(0) }&lt;br /&gt;
else{ return(1) }&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;sub save_links&lt;br /&gt;
{&lt;br /&gt;
my($page,$base,$file)=@_;&lt;/p&gt;
&lt;p&gt;$base =~ /(.*\/\/.*?)\/.*/;&lt;br /&gt;
$baseserver = $1;&lt;/p&gt;
&lt;p&gt;$base =~ /(.*\/).*/;&lt;br /&gt;
$basedir = $1;&lt;/p&gt;
&lt;p&gt;# let&#039;s make one long string containing all the tags ... just for fun &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/wink.png&quot; title=&quot;Wink&quot; alt=&quot;Wink&quot; class=&quot;smiley-content&quot; /&gt;&lt;br /&gt;
$tagstring=&amp;quot;&amp;quot;;&lt;br /&gt;
open(PAGE,&amp;quot;&amp;lt;$page&amp;quot; );&lt;br /&gt;
while (&amp;lt;PAGE&amp;gt; )&lt;br /&gt;
{&lt;br /&gt;
while($_ =~ /(.*?)&amp;lt;(.*?)&amp;gt;(.*)/)&lt;br /&gt;
{&lt;br /&gt;
$_ = $3;&lt;br /&gt;
$tagstring =$tagstring.$2.&amp;quot;\n&amp;quot;;&lt;br /&gt;
}&lt;br /&gt;
}&lt;br /&gt;
close(PAGE);&lt;br /&gt;
# let&#039;s make another string containing only the a and img tags ... sure ... we could have done that in the last step.&lt;br /&gt;
$linkstring=&amp;quot;&amp;quot;;&lt;br /&gt;
@tags=split /\n/,$tagstring;&lt;br /&gt;
foreach $tag (@tags)# images first (to fool banner-programes)&lt;br /&gt;
{&lt;br /&gt;
if ($tag =~ /.*img.*src.*/i)# do case insensitive matching&lt;br /&gt;
{&lt;br /&gt;
$linkstring = $linkstring.$tag.&amp;quot;\n&amp;quot;;&lt;br /&gt;
}&lt;br /&gt;
}&lt;br /&gt;
foreach $tag (@tags)&lt;br /&gt;
{&lt;br /&gt;
if ($tag =~ /.*a.*href.*/i)&lt;br /&gt;
{&lt;br /&gt;
$linkstring = $linkstring.$tag.&amp;quot;\n&amp;quot;;&lt;br /&gt;
}&lt;br /&gt;
}&lt;br /&gt;
# let&#039;s extract all urls ... and ... make absolute urls from them ... we could have made another loop, but ...&lt;br /&gt;
$urlstring=&amp;quot;&amp;quot;;&lt;br /&gt;
@links=split /\n/,$linkstring;&lt;br /&gt;
foreach $link (@links)&lt;br /&gt;
{&lt;br /&gt;
if ($link =~ /.*src\s?=\s?&amp;quot;(.*?)&amp;quot;.*/i)&lt;br /&gt;
{&lt;br /&gt;
}&lt;br /&gt;
elsif ($link =~ /.*href\s?=\s?&amp;quot;(.*?)&amp;quot;.*/i)&lt;br /&gt;
{&lt;br /&gt;
}&lt;br /&gt;
my $url = $1;&lt;br /&gt;
if ($url =~ /.*mailto:.*/){ $url =&amp;quot;&amp;quot;; }&lt;br /&gt;
elsif ($url =~ /$exclude_files/){ $url =&amp;quot;&amp;quot;; }&lt;br /&gt;
elsif ($url =~ /.*http:.*/){ $url = $url; }&lt;br /&gt;
elsif ($url =~ /^\/(.*)/)&lt;br /&gt;
{&lt;br /&gt;
$url = $baseserver.$url;&lt;br /&gt;
}&lt;br /&gt;
elsif ($url =~ /(.*)/)&lt;br /&gt;
{&lt;br /&gt;
$url = $basedir.$url;&lt;br /&gt;
}&lt;br /&gt;
else{ $url =&amp;quot;&amp;quot;; }&lt;br /&gt;
if ($url eq &amp;quot;&amp;quot; ){}&lt;br /&gt;
else&lt;br /&gt;
{&lt;br /&gt;
$urlstring = $urlstring.$url.&amp;quot;;$base;\n&amp;quot;;&lt;br /&gt;
}&lt;br /&gt;
}&lt;br /&gt;
open (LINKS,&amp;quot;&amp;gt;&amp;gt;$file&amp;quot; );&lt;br /&gt;
print LINKS $urlstring;&lt;br /&gt;
close (LINKS);&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;sub save_url&lt;br /&gt;
{&lt;br /&gt;
my($url,$referer,$file)=@_;&lt;br /&gt;
my $request = new HTTP::Request &#039;GET&#039;,$url;&lt;br /&gt;
$request-&amp;gt;referer($referer);&lt;br /&gt;
 my $response = $ua-&amp;gt;request($request,$file);&lt;/p&gt;
&lt;p&gt;if ($response-&amp;gt;is_success)&lt;br /&gt;
{&lt;br /&gt;
print now().&amp;quot; &amp;quot;.$response-&amp;gt;code().&amp;quot; GET $url ($referer)\n&amp;quot;;&lt;br /&gt;
}&lt;br /&gt;
else&lt;br /&gt;
{&lt;br /&gt;
print now().&amp;quot; &amp;quot;.$response-&amp;gt;code().&amp;quot; GET $url ($referer)\n&amp;quot;;&lt;br /&gt;
}&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;sub now&lt;br /&gt;
{&lt;br /&gt;
 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =&lt;br /&gt;
       gmtime(time);&lt;br /&gt;
$year += 1900;&lt;/p&gt;
&lt;p&gt;$now=sprintf(&amp;quot;%4u-%02u-%02u %02u:%02u:%02u&amp;quot;,$year,$mon,$mday,$hour,$min,$sec);&lt;/p&gt;
&lt;p&gt;return $now;&lt;br /&gt;
}&lt;br /&gt;
--------&lt;/p&gt;
&lt;p&gt;The code is very crappy and unfinished, but it works and did the job it was used for very well.&lt;/p&gt;
&lt;p&gt;ATTENTION: The &amp;quot;crawler&amp;quot; is not site bound at it doesn&#039;t check robots.txt (BAD STYLE).&lt;br /&gt;
DON&#039;T start it friday evening and go away till monday. you may suck the whole internet &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/wink.png&quot; title=&quot;Wink&quot; alt=&quot;Wink&quot; class=&quot;smiley-content&quot; /&gt;&lt;br /&gt;
(actually i got about 8gig the first weekend i started this ...)&lt;/p&gt;
&lt;p&gt;hope it helps.&lt;/p&gt;
&lt;p&gt;AGAIN: DON&#039;T USE THIS SCRIPT AS IS !!!&lt;br /&gt;
(sorry for shouting, but it&#039;s important.)&lt;/p&gt;
&lt;p&gt;ciao&lt;br /&gt;
Anti&lt;/p&gt;
&lt;p&gt;----------&lt;br /&gt;
ps:watch my work in progress at&lt;br /&gt;
&lt;a href=&quot;http://webhome.nu/&quot; class=&quot;bb-url&quot;&gt;http://webhome.nu/&lt;/a&gt;&lt;/p&gt;
 </description>
     <pubDate>Mon, 31 May 1999 20:46:00 +0000</pubDate>
 <dc:creator>anti</dc:creator>
 <guid isPermaLink="false">comment 1002862 at https://www.webmaster-forums.net</guid>
  </item>
  </channel>
</rss>
