<?xml version="1.0" encoding="utf-8" ?><rss version="2.0" xml:base="https://www.webmaster-forums.net/crss/node/1020896" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title></title>
    <link>https://www.webmaster-forums.net/crss/node/1020896</link>
    <description></description>
    <language>en</language>
          <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128570</link>
    <description> &lt;p&gt;I really feel you need to have an efficient index table.  I&#039;m not very familiar with the vBulletin code, but I have a basic understaning of their search mechanism.&lt;/p&gt;
&lt;p&gt;Several tables are used to compose the index and perform searches.  There is a `word` table which has a `word_id` and `word`.  There is a `searchindex` table with `word_id`, `post_id`, and `intitle`.  Then there is the `search` table with `search_id`, `query`, `post_ids`, and `dateline`.&lt;/p&gt;
&lt;p&gt;When a new post (or essay) is submitted, you will add any new words in the essay to the `word` table.  Then add the post_id (or essay_id) to the `searchindex` table for any words that are in the post (or essay).&lt;/p&gt;
&lt;p&gt;When a person searches for &quot;war iraq&quot;, you will query the `word` table for &quot;war&quot; AND &quot;iraq&quot; and get the `word_id` for each.  Then you will query the `searchindex` table for any posts that contain the word_ids for &quot;war&quot; OR &quot;iraq&quot;.&lt;/p&gt;
&lt;p&gt;Whenever a search is performed, the results of the search are temporarily stored in the `search` table.  This helps keep the server load down.  Lets say that you show 10 results per page, but there are 50 posts (essays) that match &quot;war&quot; OR &quot;iraq&quot;.  Well, rahter than performing the search every time, we&#039;ll just pull the list of posts out of the `search` table and show the next 10.  Also, if two people search for the same thing, the system will see that the search was performed only 5 minutes ago, and use the cached results in the `search` table.  The system may be configurable to used cached results at a variable time (setting it to daily would mean it would perform any query only once per day).&lt;/p&gt;
&lt;p&gt;To enhace this and keep the database smaller, you could eliminate &#039;noise&#039; words from the `words` table.  vBulletin restricts it&#039;s table by requiring so many characters per word (configurable).&lt;/p&gt;
 </description>
     <pubDate>Sun, 06 Apr 2003 21:20:22 +0000</pubDate>
 <dc:creator>Mark Hensler</dc:creator>
 <guid isPermaLink="false">comment 1128570 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128568</link>
    <description> &lt;p&gt;The problem with the idea of pulling out keywords with a script is what is a common word?&lt;/p&gt;
&lt;p&gt;If a person references their work on the Inca population with internet searches, for instance, then you&#039;ll get a lot of false hits for &quot;internet&quot;. If you say, well, then &quot;internet&quot; is a common word, what do you do with essays on internet technologies?&lt;/p&gt;
&lt;p&gt;I realize it&#039;s more work to put together a solid search feature                         but if you want the people who are looking for the essays to find them, you HAVE to put in that kind of effort. No one will use a service that doesn&#039;t provide them with reasonably easy to attain results.&lt;/p&gt;
&lt;p&gt;One of the largest complaints from users is not being able to accurately find information on websites. They just give up and go elsewhere.&lt;/p&gt;
 </description>
     <pubDate>Sun, 06 Apr 2003 16:17:46 +0000</pubDate>
 <dc:creator>Suzanne</dc:creator>
 <guid isPermaLink="false">comment 1128568 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128565</link>
    <description> &lt;p&gt;The person entering the essay would enter the keywords.  But I&#039;m still thinking that the search would be speeded up slightly (if not more) if I created a field that extracted common words, and made the keyword field in the database contain the results.  I&#039;ll just be appreciative that folks are submitting essays to the database.  I don&#039;t want to hassle them any more. &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As for spell checking, hopefully we&#039;ll be getting around 50 essays/day in the near future, so I don&#039;t think that&#039;s feasible.  But perhaps I can find a poor college student who wouldn&#039;t mind &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
 </description>
     <pubDate>Sun, 06 Apr 2003 11:38:13 +0000</pubDate>
 <dc:creator>shanda</dc:creator>
 <guid isPermaLink="false">comment 1128565 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128490</link>
    <description> &lt;p&gt;1. If you&#039;re going to use a script, you will get very poor results for keywords. You really need to run it through a human, and pick out the main concepts, even if the words aren&#039;t in the essay. Additionally, you need to include spelling errors in the keyword list.&lt;/p&gt;
&lt;p&gt;2. I can&#039;t imagine a time when using my own keywords wouldn&#039;t be ideal -- how are you going to know what I&#039;m searching for? That said, you need to have both the ability to search (as Mark said, first keywords, then all text), and some way to narrow the search -- categories. So, ideally, you would be able to search within a particular category. Again, human sorting would be necessary.&lt;/p&gt;
&lt;p&gt;3. Do you mean require the visitor SEEKING an essay enter their own keywords, or require the person SUBMITTING an essay enter their own keywords. If the later, then yes, yes!&lt;/p&gt;
 </description>
     <pubDate>Fri, 04 Apr 2003 17:25:38 +0000</pubDate>
 <dc:creator>Suzanne</dc:creator>
 <guid isPermaLink="false">comment 1128490 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128485</link>
    <description> &lt;p&gt;Suggestion needed: Should I create a script that extracts common words (a, the, and, or, for, etc) from the essays and then store the remaining words as keywords, or require visitors to enter keywords?&lt;/p&gt;
&lt;p&gt;If you were a site visitor, would entering your own keywords be a bother?&lt;/p&gt;
 </description>
     <pubDate>Fri, 04 Apr 2003 15:37:01 +0000</pubDate>
 <dc:creator>shanda</dc:creator>
 <guid isPermaLink="false">comment 1128485 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128483</link>
    <description> &lt;p&gt;A keyword field would be the way to go. What you could do is to search based on keywords, if no results were found then search by the text. That way your most revelant would be first.&lt;/p&gt;
 </description>
     <pubDate>Fri, 04 Apr 2003 15:29:45 +0000</pubDate>
 <dc:creator>mairving</dc:creator>
 <guid isPermaLink="false">comment 1128483 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128479</link>
    <description> &lt;p&gt;I would set up a keyword field, with a list of all the relevant keywords, and search only that field during the search. It will take you a bit more time when entering the essay, but save you a whole lot of time in the long run.&lt;/p&gt;
 </description>
     <pubDate>Fri, 04 Apr 2003 15:10:02 +0000</pubDate>
 <dc:creator>Suzanne</dc:creator>
 <guid isPermaLink="false">comment 1128479 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128471</link>
    <description> &lt;p&gt;No, it&#039;s only text that&#039;s stored in a database.  Right now the 50,000 essays aren&#039;t categorized (on my &#039;to do&#039; list for the next 20 years) so the search has to go through each of the records.  I&#039;ve looked into things like htdig, but I really don&#039;t understand it.&lt;/p&gt;
&lt;p&gt;The essays can be lengthy at times.  Uh...not the ones I write, but the ones site visitors send in, and so I&#039;m trying to minimize the time it takes to search the records.&lt;/p&gt;
 </description>
     <pubDate>Fri, 04 Apr 2003 09:41:00 +0000</pubDate>
 <dc:creator>shanda</dc:creator>
 <guid isPermaLink="false">comment 1128471 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128424</link>
    <description> &lt;p&gt;I&#039;m guessing your not searching html files, as 50,000 files would take an hour to search.  So, my question is: How is your data stored/structured?&lt;/p&gt;
 </description>
     <pubDate>Thu, 03 Apr 2003 17:40:21 +0000</pubDate>
 <dc:creator>Mark Hensler</dc:creator>
 <guid isPermaLink="false">comment 1128424 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/best-way-search#comment-1128418</link>
    <description> &lt;p&gt;i tried it...doesnt seem that slow to me...and i&#039;m using the computer on a school LAN right now...lol&lt;/p&gt;
&lt;p&gt;you are using a good combo, the only thing i could say would be make the server faster or maybe figure out a way to reorganize the DB so it can be searched faster?  i dont know how to do this, as i am not anywhere near being a MySQL practioner, but i believe that configuration does have something to do with it.  when in doubt, upgrade the server&lt;/p&gt;
 </description>
     <pubDate>Thu, 03 Apr 2003 16:24:15 +0000</pubDate>
 <dc:creator>brady.k</dc:creator>
 <guid isPermaLink="false">comment 1128418 at https://www.webmaster-forums.net</guid>
  </item>
  </channel>
</rss>
