<?xml version="1.0" encoding="utf-8" ?><rss version="2.0" xml:base="https://www.webmaster-forums.net/crss/node/1022165" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title></title>
    <link>https://www.webmaster-forums.net/crss/node/1022165</link>
    <description></description>
    <language>en</language>
          <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136977</link>
    <description> &lt;p&gt;Or try it free online from Safari for 30 days (the online Oreilly bookshop).&lt;/p&gt;
 </description>
     <pubDate>Fri, 29 Aug 2003 07:42:24 +0000</pubDate>
 <dc:creator>Wil</dc:creator>
 <guid isPermaLink="false">comment 1136977 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136952</link>
    <description> &lt;p&gt;Alright, thanks for the help!&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 19:06:00 +0000</pubDate>
 <dc:creator>Mark Hensler</dc:creator>
 <guid isPermaLink="false">comment 1136952 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136927</link>
    <description> &lt;blockquote class=&quot;bb-quote-body&quot;&gt;&lt;p&gt;Quote: &lt;em&gt;Originally posted by Mark Hensler &lt;/em&gt;&lt;br /&gt;
&lt;strong&gt;The mechanics happens to be exactly what I&#039;m interested in.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I&#039;ve taken several programming courses (in college), but they don&#039;t teach much farther than the language&#039;s syntax and grammar.  Assembly (i386+) came the closest to &quot;How&#039;s that work?&quot;, but still left me wondering about higher level languages.  Regular Expressions are a whole other animal, and I&#039;m quite interested to know what makes them tick.&lt;/p&gt;
&lt;p&gt;Do you have any online resources you could refer me to? &lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;I don&#039;t, offhand, know of online resources which give a good introduction to the mechanics. The information is certianly out there, but it&#039;s usually kind of raw. If I run across something, I&#039;ll let you know.&lt;/p&gt;
&lt;p&gt;On the other hand, I can easily refer you to a printed resource:  chapter 4 of &#039;Mastering Regular Expressions&#039;, from O&#039;Reilly &amp;amp; Associates. Great introduction.&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 08:39:37 +0000</pubDate>
 <dc:creator>Prospero</dc:creator>
 <guid isPermaLink="false">comment 1136927 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136924</link>
    <description> &lt;p&gt;The mechanics happens to be exactly what I&#039;m interested in.&lt;/p&gt;
&lt;p&gt;I&#039;ve taken several programming courses (in college), but they don&#039;t teach much farther than the language&#039;s syntax and grammar.  Assembly (i386+) came the closest to &quot;How&#039;s that work?&quot;, but still left me wondering about higher level languages.  Regular Expressions are a whole other animal, and I&#039;m quite interested to know what makes them tick.&lt;/p&gt;
&lt;p&gt;Do you have any online resources you could refer me to?&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 08:11:22 +0000</pubDate>
 <dc:creator>Mark Hensler</dc:creator>
 <guid isPermaLink="false">comment 1136924 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136923</link>
    <description> &lt;p&gt;You know, I&#039;m feeling more and more squeamish about this explanation... so I wanted to say again, don&#039;t take the details here too seriously. Particularly the mechanical details, which, I hope, will get across a sense of how this sort of thing appears to work.&lt;/p&gt;
&lt;p&gt;The model we&#039;re using to understand things actually describes what are called DFA regular expressions. It&#039;s a simpler model, which makes it nice for explanations like these. The problem is, Perl doesn&#039;t actually use DFA, but the other kind, NFA regular expressions. And technically, they&#039;re modified NFA, so they&#039;re not really &#039;regular&#039; expressions at all.&lt;/p&gt;
&lt;p&gt;You can probably see why I was loathe to go into this. &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt; Oversimplification is sometimes useful. However I&#039;d hate to feel responsible for spreading misinformation, so I&#039;m adding a caveat. This isn&#039;t a story you want to rely upon too closely.&lt;/p&gt;
&lt;p&gt;And don&#039;t worry too much about this stuff, either. It&#039;s not that important for 99% of what you need in Perl.&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 07:56:04 +0000</pubDate>
 <dc:creator>Prospero</dc:creator>
 <guid isPermaLink="false">comment 1136923 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136918</link>
    <description> &lt;blockquote class=&quot;bb-quote-body&quot;&gt;&lt;p&gt;Quote: &lt;em&gt;Originally posted by Mark Hensler &lt;/em&gt;&lt;br /&gt;
&lt;strong&gt;So would this be correct...&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Because the pattern matched.  The search was continued at the end of the last successfull match within the string.&lt;/p&gt;
&lt;p&gt;Had the match failed, the search would have resumed at the letter after the beging of the last failed match within the string.&lt;br /&gt;
&lt;div class=&quot;codeblock&quot;&gt;&lt;code&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt; ^&lt;br /&gt;first pass starts here and fails, as &amp;quot;3&amp;quot; is not \w&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt;&amp;nbsp; ^&lt;br /&gt;second pass starts here and suceeds, as &amp;quot;rd&amp;quot; matches \w{1,3}&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^ &lt;br /&gt;third pass starts here and fails, as &amp;quot; &amp;quot; is not \w&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;fourth pass starts here and suceeds, as &amp;quot;place&amp;quot; matches \w{1,3}\w*&lt;/code&gt;&lt;/div&gt;&#039; &lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;That&#039;s pretty close. Actually, though, there&#039;s only one pass. The engine simply goes along one character at a time, moving to different states as it goes. When it gets to the end, it&#039;s done.&lt;/p&gt;
&lt;p&gt;Let&#039;s say our pattern was /ri\w/g, and our string is &quot;3rd prize&quot;.&lt;br /&gt;
&lt;div class=&quot;codeblock&quot;&gt;&lt;code&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt; ^&lt;br /&gt;Not a possible match. Discard.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp; ^&lt;br /&gt;A possible match. Keep the &amp;#039;r&amp;#039;, move on.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;No longer a possible match. Discard the &amp;#039;r&amp;#039;, move on.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;Not a possible match. Discard.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;Not a possible match. Discard.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;A possible match. Keep the &amp;#039;r&amp;#039;, move on.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;A possible match. Keep the &amp;#039;ri&amp;#039;, move on.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;A match. Keep the &amp;#039;riz&amp;#039;, move on, since /g is in effect.&lt;br /&gt;&lt;br /&gt;&amp;quot;3rd prize&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;Not a possible match. Discard.&lt;/code&gt;&lt;/div&gt;&#039;&lt;/p&gt;
&lt;p&gt;But you&#039;ve essentially got the idea, I think.&lt;/p&gt;
&lt;p&gt;Another, less mechanical way to think about it is this:  each character in the string is only going to be part of a single pattern match. Hence, if the &#039;r&#039; in &#039;George&#039; was part of the character sequence matched by (\w{1,3})\w* (which it was, since the \w* is greedy), it won&#039;t be used in any future matches. That&#039;s why I put \w* in there: to prevent the rest of the word from matching the part of the regex I actually care about, (\w{1,3})&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 04:47:00 +0000</pubDate>
 <dc:creator>Prospero</dc:creator>
 <guid isPermaLink="false">comment 1136918 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136915</link>
    <description> &lt;p&gt;So would this be correct...&lt;/p&gt;
&lt;p&gt;Because the pattern matched.  The search was continued at the end of the last successfull match within the string.&lt;/p&gt;
&lt;p&gt;Had the match failed, the search would have resumed at the letter after the beging of the last failed match within the string.&lt;br /&gt;
&lt;div class=&quot;codeblock&quot;&gt;&lt;code&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt; ^&lt;br /&gt;first pass starts here and fails, as &amp;quot;3&amp;quot; is not \w&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt;&amp;nbsp; ^&lt;br /&gt;second pass starts here and suceeds, as &amp;quot;rd&amp;quot; matches \w{1,3}&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^ &lt;br /&gt;third pass starts here and fails, as &amp;quot; &amp;quot; is not \w&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;quot;3rd place&amp;quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ^&lt;br /&gt;fourth pass starts here and suceeds, as &amp;quot;place&amp;quot; matches \w{1,3}\w*&lt;/code&gt;&lt;/div&gt;&#039;&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 02:46:54 +0000</pubDate>
 <dc:creator>Mark Hensler</dc:creator>
 <guid isPermaLink="false">comment 1136915 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136914</link>
    <description> &lt;blockquote class=&quot;bb-quote-body&quot;&gt;&lt;p&gt;Quote: &lt;em&gt;Originally posted by Mark Hensler &lt;/em&gt;&lt;br /&gt;
&lt;strong&gt;Yes, welcome!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Alright, I tested Prospero&#039;s code, and I see that it works.&lt;/p&gt;
&lt;p&gt;I understand why \w* matches up to the end of the word string.  But why does \w only match the begining of the word string.  Seems like you&#039;d need \W\w to match the begining of a word string.&lt;/p&gt;
&lt;p&gt;I guess what&#039;s confusing me is that I&#039;ve recently read about &quot;Once-only subpatterns&quot; -- or &quot;(?&amp;gt;&quot; -- and that&#039;s lingering in the back of my mind. &lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Nothing quite so fancy as that. &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;  I&#039;ll try to explain it in a few different ways, and hope one will make sense.&lt;/p&gt;
&lt;p&gt;Remember, to be satisfied, the string needs to satisfy the whole pattern. The pattern says, &quot;Find instances where between one and three &#039;word characters&#039; are immediately followed by zero or more word characters, and capture the one-to-three part.&quot;&lt;/p&gt;
&lt;p&gt;So the pattern actually matches the whole word each time. If it finds between 1 and 3 word consecutive word characters, it will always match... because it can&#039;t fail to match the zero-or-more part.  That&#039;ll match on nothing at all, if it has to, though it will try to match as many characters as it can.  Regexes are greedy, that way. Hence it&#039;s a lot like saying \w+  That&#039;s exactly what we&#039;d say, in fact, if we didn&#039;t need to capture the first several letters.&lt;/p&gt;
&lt;p&gt;So, say you have a sequence like &#039; a &#039;. The pattern first looks at the leading space. It realizes that no matter what comes next, a space cannot be the beginning of what it&#039;s looking for. So it ignores the space. Next step:  look at the &#039;a&#039;.  Is &#039;a&#039; a possible match?  Yep.  It matches \w, and also \w{1,3}.&lt;/p&gt;
&lt;p&gt;But, it&#039;s greedy. It doesn&#039;t want 1, when it might be able to get 3. So it looks at the next character:  another space.  Drat.  It&#039;s not getting any more than one... but that&#039;s enough. It satisfies \w{1,3}.&lt;/p&gt;
&lt;p&gt;As it turns out, it also satisfies \w{1,3}\w*.  It found one word character, followed immediately by zero word characters. So we have a valid match. Now, because the pattern is actually (\w{1,3})\w*, it &#039;captures&#039; the \w{1,3} part, and puts it in a buffer.&lt;/p&gt;
&lt;p&gt;(Incidentally, this is not a perfectly accurate description of what happens, but it should give you a pretty good model with which to work.)&lt;/p&gt;
&lt;p&gt;Now, what if our string had been &#039; a priori &#039;?  Well, the matching for &#039; a &#039; went pretty much as described.  The regex engine tosses out the second space, of course, because it has no use for it. Then it keeps going, because we used the /g switch, so it&#039;s not done yet.  It comes upon a &#039;p&#039;. Is that a possible match? Yep. And if the regex weren&#039;t greedy, it&#039;d be satisfied already; &#039;p&#039; matches (\w{1,3})\w* just as &#039;a&#039; did.&lt;/p&gt;
&lt;p&gt;But its job is to make the match as long as is possible. And good thing, too, or patterns like .* which would match an infinite number of times, and we&#039;d never get anywhere. It&#039;s zero or more of anything, after all, and if it keeps stopping at zero, it&#039;s no fun.&lt;/p&gt;
&lt;p&gt;So, &#039;p&#039; is a satisfactory beginning. Next it looks and sees &#039;r&#039;, which matches \w. &#039;pr&#039;, too, matches \w{1,3} but isn&#039;t the maximum (3), so onward! Now it sees &#039;i&#039;. &#039;pri&#039; not only matches \w, it&#039;s the longest possible match for \w{1,3}, so there&#039;s no sense looking at the next one to see what it is, at this point. That part of the pattern matched.&lt;/p&gt;
&lt;p&gt;The pattern, however, calls for \w{1,3} followed immediately by \w*. Well, the nice thing about * is, it&#039;s always a match. Whatever it is, you have at least zero of it. But again, * is greedy. So we look at the next character. Turns out it&#039;s &#039;o&#039;, and &#039;o&#039; matches \w. But &#039;o&#039; isn&#039;t the longest possible match... with *, there is no longest possible match. So we keep going. Next comes &#039;r&#039;, which matches \w, but &#039;or&#039; again isn&#039;t the best \w* could hope for, so onward; &#039;i&#039; matches \w, and we have &#039;ori&#039;. Next comes a space.  Oops.  That doesn&#039;t match \w, so now our search for that pattern is finished. Stuff \w{1,3}, which is &#039;pri&#039;, in a buffer, and keep going.&lt;/p&gt;
&lt;p&gt;What comes after space? The end of the string. So, no more matches, and we&#039;re done. Return the buffers, which contain &#039;a&#039; and &#039;pri&#039;, and go back to our regularly scheduled programming.&lt;/p&gt;
&lt;p&gt;Again, this description is subtly wrong, particularly the order in which things happen. But don&#039;t worry about that. The order doesn&#039;t interest you, just the results. If you ever get into writing parsers, it&#039;ll be more interesting to sweat the details.&lt;/p&gt;
&lt;p&gt;Now, you were wondering why that pattern didn&#039;t match &quot;Geo&quot;, &quot;eor&quot;, &quot;org&quot;, and so on. Think about this:  what if our pattern had been .* and our string &#039;foobar&#039;. Would you expect that to match &#039;foobar&#039;, &#039;oobar&#039;, &#039;obar&#039;, &#039;bar&#039;, &#039;ar&#039;, and &#039;r&#039;? I wouldn&#039;t. The matching engine doesn&#039;t find &#039;f&#039; &#039;o&#039; &#039;o&#039; &#039;b&#039; &#039;a&#039; &#039;r&#039;, see that it matches .*, and then head back to the first &#039;o&#039;, and try again.&lt;/p&gt;
&lt;p&gt;Well, there. Hopefully you&#039;re not more confused than when I started. &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 01:40:20 +0000</pubDate>
 <dc:creator>Prospero</dc:creator>
 <guid isPermaLink="false">comment 1136914 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136912</link>
    <description> &lt;p&gt;Yes, welcome!&lt;/p&gt;
&lt;p&gt;Alright, I tested Prospero&#039;s code, and I see that it works.&lt;/p&gt;
&lt;p&gt;I understand why \w* matches up to the end of the word string.  But why does \w only match the begining of the word string.  Seems like you&#039;d need \W\w to match the begining of a word string.&lt;/p&gt;
&lt;p&gt;I guess what&#039;s confusing me is that I&#039;ve recently read about &quot;Once-only subpatterns&quot; -- or &quot;(?&amp;gt;&quot; -- and that&#039;s lingering in the back of my mind.&lt;/p&gt;
 </description>
     <pubDate>Thu, 28 Aug 2003 00:07:59 +0000</pubDate>
 <dc:creator>Mark Hensler</dc:creator>
 <guid isPermaLink="false">comment 1136912 at https://www.webmaster-forums.net</guid>
  </item>
  <item>
    <title></title>
    <link>https://www.webmaster-forums.net/serverside-scripting/regular-expression-help#comment-1136908</link>
    <description> &lt;p&gt;welcome Prospero! And thanks for a super explanation. &lt;img src=&quot;https://www.webmaster-forums.net/misc/smileys/smile.png&quot; title=&quot;Smiling&quot; alt=&quot;Smiling&quot; class=&quot;smiley-content&quot; /&gt;&lt;/p&gt;
 </description>
     <pubDate>Wed, 27 Aug 2003 21:18:36 +0000</pubDate>
 <dc:creator>Suzanne</dc:creator>
 <guid isPermaLink="false">comment 1136908 at https://www.webmaster-forums.net</guid>
  </item>
  </channel>
</rss>
