You are here: Re: Extract any URL from any string? « PHP Programming Language « IT news, forums, messages
Re: Extract any URL from any string?

Posted by deko on 02/12/07 19:53

> Hard coding TLDs is generally not useful, as you never know when unexpected
> ones may be put in use. Plus, you do not allow variance for different schemes
> other than http(s).
>
> You do not support valid URLs that have the authority:
>
> <http://bob1234z_.sss:libobob@example.com/foo/bar
> baz/index.bak.php?q=my&q2=query#frag2>
>
> This is indeed a valid URL, but your algorithm fails. It's far more useful to
> use a single regex, anticipate any scheme (look at wikipedia or search some
> RFCs for valid URI format), and any TLD.
>
> parse_url is not meant to be used for validation, as stated in the PHP docs
> themselves.
>
> This is an example implementing the regex I made, recently:
>
> <?php
> $re = '%
> ( [\w.+-]+ : (?://)? ) # scheme name
>
> ( [^/]+ ) # authority, domain
> ( / [^?]+ )? # path, if exists
>
> # query and fragment, which may or may not exist
> (?:
> \\? # query initializer
> ( [^#]+ ) # grab query
> (?: \\# ([\w-]+) )? # fragment, if exists
> )?
> %x';
>
> $s = 'Welcome.to{"http://user:pass@example.com/foo
> bar".-/index.bak.php?q=query&r=arrr#frag2_2> borky borked!';
> if (preg_match($re, $s, $m)) {
> echo '<p>Original: <code>' . $s . '</code></p>';
> echo '<p>Extrapolation: <a href="'
> . htmlentities($m[0], ENT_QUOTES) . '">' . ($m[1].$m[2])
> . '</a> (full URI in link, see status bar).</p>';
> }
> else {
> echo 'Not a valid URI.';
> }
> ?>

Thanks, I'll try to use this. As for Internet address validation, I've come to
this conclusion: I can only validate known quantities - that is, the scheme
(http, https, ftp) and the TLD. Granted, TLDs come and go, but not often enough
to avoid using a list to validate against. As for domain names, I can only
validate format - that is, 255 or less characters and no non-alphanumeric
characters (other than a hyphen). geturl.php is my first attempt at implementing
this. I'll have another rev posted at http://liarsscourge.com shortly. Thanks
for your help!

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация