|
Posted by Curtis on 02/12/07 10:04
deko wrote:
> http://www.liarsscourge.com/ <<== this is better
>
> known bug: if an email address appears in the test string before a valid
> URL, the script will not find the URL
Hard coding TLDs is generally not useful, as you never know when
unexpected ones may be put in use. Plus, you do not allow variance for
different schemes other than http(s).
You do not support valid URLs that have the authority:
<http://bob1234z_.sss:libobob@example.com/foo/bar
baz/index.bak.php?q=my&q2=query#frag2>
This is indeed a valid URL, but your algorithm fails. It's far more
useful to use a single regex, anticipate any scheme (look at wikipedia
or search some RFCs for valid URI format), and any TLD.
parse_url is not meant to be used for validation, as stated in the PHP
docs themselves.
This is an example implementing the regex I made, recently:
<?php
$re = '%
( [\w.+-]+ : (?://)? ) # scheme name
( [^/]+ ) # authority, domain
( / [^?]+ )? # path, if exists
# query and fragment, which may or may not exist
(?:
\\? # query initializer
( [^#]+ ) # grab query
(?: \\# ([\w-]+) )? # fragment, if exists
)?
%x';
$s = 'Welcome.to{"http://user:pass@example.com/foo
bar".-/index.bak.php?q=query&r=arrr#frag2_2> borky borked!';
if (preg_match($re, $s, $m)) {
echo '<p>Original: <code>' . $s . '</code></p>';
echo '<p>Extrapolation: <a href="'
. htmlentities($m[0], ENT_QUOTES) . '">' . ($m[1].$m[2])
. '</a> (full URI in link, see status bar).</p>';
}
else {
echo 'Not a valid URI.';
}
?>
Curtis
Navigation:
[Reply to this message]
|