|  | Posted by Curtis on 02/12/07 10:04 
deko wrote:> http://www.liarsscourge.com/ <<== this is better
 >
 > known bug: if an email address appears in the test string before a valid
 > URL, the script will not find the URL
 
 Hard coding TLDs is generally not useful, as you never know when
 unexpected ones may be put in use. Plus, you do not allow variance for
 different schemes other than http(s).
 
 You do not support valid URLs that have the authority:
 
 <http://bob1234z_.sss:libobob@example.com/foo/bar
 baz/index.bak.php?q=my&q2=query#frag2>
 
 This is indeed a valid URL, but your algorithm fails. It's far more
 useful to use a single regex, anticipate any scheme (look at wikipedia
 or search some RFCs for valid URI format), and any TLD.
 
 parse_url is not meant to be used for validation, as stated in the PHP
 docs themselves.
 
 This is an example implementing the regex I made, recently:
 
 <?php
 $re = '%
 ( [\w.+-]+ : (?://)? ) # scheme name
 
 ( [^/]+ ) # authority, domain
 ( / [^?]+ )? # path, if exists
 
 # query and fragment, which may or may not exist
 (?:
 \\? # query initializer
 ( [^#]+ ) # grab query
 (?: \\# ([\w-]+) )? # fragment, if exists
 )?
 %x';
 
 $s = 'Welcome.to{"http://user:pass@example.com/foo
 bar".-/index.bak.php?q=query&r=arrr#frag2_2> borky borked!';
 if (preg_match($re, $s, $m)) {
 echo '<p>Original: <code>' . $s . '</code></p>';
 echo '<p>Extrapolation: <a href="'
 . htmlentities($m[0], ENT_QUOTES) . '">' . ($m[1].$m[2])
 . '</a> (full URI in link, see status bar).</p>';
 }
 else {
 echo 'Not a valid URI.';
 }
 ?>
 
 Curtis
  Navigation: [Reply to this message] |