|
Posted by deciacco on 02/09/07 20:46
How about:
if
(preg_match('/\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0
-9+&@#\/%=~_|]/i', $subject, $result)) {
$url = $result[0];
} else {
$url = "";
}
-----Original Message-----
From: deko [mailto:deko@nospam.com]
Posted At: Friday, February 09, 2007 2:15 PM
Posted To: comp.lang.php
Conversation: Best way to extract URL from random string?
Subject: Best way to extract URL from random string?
If I have random and unpredictable user agent strings containing URLs,
what is
the best way to extract the URL?
For example, let's say the string looks like this:
registered NYSE 943 <a href="http://netforex.net"> Forex Trading Network
Organization </a> info@netforex.org
What's the best way to extract http://netforex.net ?
I have code that checks for identifiable browsers and bots, but when the
agent
string has no identifiable information other than a URL, I want to grab
the URL.
Here's a first crack at it:
..
..
..
[code omitted]
..
..
..
elseif (eregi("http://", $agent))
{
$agent = stristr($agent, "http://");
$agent = parse_url($agent);
$agent = $agent['host'];
//check for subdomains
$agent_a = explode(".", $agent);
$agent_r = array_reverse($agent_a);
$sub = count($agent_r) - 1;
$tld3 = substr($agent_r[0], 0, 3);
if (eregi("^(com|net|org|edu|biz|gov)$", $tld3)) //common tld's
{
while ($sub > 0)
{
$domain = $domain.$agent_r[$sub].".";
$sub--;
}
$refurl = $domain.$tld3;
}
$referrer = "<a href='".$refurl."'>".$refurl."</a>";
}
else
{
$referrer = "unknown";
}
Are there any PHP functions that will help here? How to handle sub
domains?
International domains?
Thanks in advance.
[Back to original message]
|