Reply to Re: Extract any URL from any string?

Your name:

Reply:


Posted by Curtis on 02/12/07 10:04

deko wrote:
> http://www.liarsscourge.com/ <<== this is better
>
> known bug: if an email address appears in the test string before a valid
> URL, the script will not find the URL

Hard coding TLDs is generally not useful, as you never know when
unexpected ones may be put in use. Plus, you do not allow variance for
different schemes other than http(s).

You do not support valid URLs that have the authority:

<http://bob1234z_.sss:libobob@example.com/foo/bar
baz/index.bak.php?q=my&q2=query#frag2>

This is indeed a valid URL, but your algorithm fails. It's far more
useful to use a single regex, anticipate any scheme (look at wikipedia
or search some RFCs for valid URI format), and any TLD.

parse_url is not meant to be used for validation, as stated in the PHP
docs themselves.

This is an example implementing the regex I made, recently:

<?php
$re = '%
( [\w.+-]+ : (?://)? ) # scheme name

( [^/]+ ) # authority, domain
( / [^?]+ )? # path, if exists

# query and fragment, which may or may not exist
(?:
\\? # query initializer
( [^#]+ ) # grab query
(?: \\# ([\w-]+) )? # fragment, if exists
)?
%x';

$s = 'Welcome.to{"http://user:pass@example.com/foo
bar".-/index.bak.php?q=query&r=arrr#frag2_2> borky borked!';
if (preg_match($re, $s, $m)) {
echo '<p>Original: <code>' . $s . '</code></p>';
echo '<p>Extrapolation: <a href="'
. htmlentities($m[0], ENT_QUOTES) . '">' . ($m[1].$m[2])
. '</a> (full URI in link, see status bar).</p>';
}
else {
echo 'Not a valid URI.';
}
?>

Curtis

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация