<?xml version="1.0" encoding="utf-8" ?><rss version="2.0" xml:base="https://www.webmaster-forums.net/crss/node/1027400" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title></title>
    <link>https://www.webmaster-forums.net/crss/node/1027400</link>
    <description></description>
    <language>en</language>
          <item>
    <title></title>
    <link>https://www.webmaster-forums.net/webmasters-corner/google-searchengine-programming#comment-1162959</link>
    <description> &lt;p&gt;Ok, let me clarify a few things:&lt;/p&gt;
&lt;p&gt;The spider goes and pulls the pages, if you have all the info on your server, you don&#039;t need a spider.&lt;/p&gt;
&lt;p&gt;The index is a store of all the raw data, if you have all the info on your server, this is less important.&lt;/p&gt;
&lt;p&gt;What does the brunt of the work is a keywords system. For searching the web, a lot of information has to be processed, and pages are ranked in a large database.&lt;/p&gt;
&lt;p&gt;When the query is put to the database, by the front end, it goes &quot;Oh, here we go&quot; and sends you the link(s).&lt;/p&gt;
&lt;p&gt;On a much smaller scale, you can simply have all your information in a text file, seperated in a logical fashion, and then the end user queries it, and it gets processed.&lt;/p&gt;
&lt;p&gt;If you know UNIX/Linux, it&#039;s like Grep, on php.net, look up preg_grep and ereg. If you do anything with searching, you&#039;ll want to know preg and ereg.&lt;/p&gt;
&lt;p&gt;Regular expression resources:&lt;br /&gt;
&lt;a href=&quot;http://regexlib.com/&quot; class=&quot;bb-url&quot;&gt;http://regexlib.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;and preg is similar (but has key differences) to regular expressions.&lt;/p&gt;
&lt;p&gt;The second way in which a search engine can work is soley from keywords. This is probably a faster, but not necessarily better solution&lt;/p&gt;
&lt;p&gt;After the spider has gone and done it&#039;s thing, you now have lets say 1000 pages from the web. Each script can be broken down into its key phrases, say, any word longer than 4 characters? Then there is a database, which has this huge list of keywords, and with each keyword an address to a page. So, the end user, using the front end script, requests &quot;juice&quot; any of the addresses with the keyword of &quot;juice&quot; can be put into an array, and then you can format that back to the enduser.&lt;/p&gt;
&lt;p&gt;The google page rank system will just up priority of certain addresses, based on it&#039;s own ideals.&lt;/p&gt;
&lt;p&gt;So in that case it is all the index, but the index is a lot more complex than you would have first imagined it.&lt;/p&gt;
 </description>
     <pubDate>Sat, 18 Dec 2004 09:20:13 +0000</pubDate>
 <dc:creator>CptAwesome</dc:creator>
 <guid isPermaLink="false">comment 1162959 at https://www.webmaster-forums.net</guid>
  </item>
  </channel>
</rss>
