|
Posted by deko on 02/09/07 20:15
If I have random and unpredictable user agent strings containing URLs, what is
the best way to extract the URL?
For example, let's say the string looks like this:
registered NYSE 943 <a href="http://netforex.net"> Forex Trading Network
Organization </a> info@netforex.org
What's the best way to extract http://netforex.net ?
I have code that checks for identifiable browsers and bots, but when the agent
string has no identifiable information other than a URL, I want to grab the URL.
Here's a first crack at it:
..
..
..
[code omitted]
..
..
..
elseif (eregi("http://", $agent))
{
$agent = stristr($agent, "http://");
$agent = parse_url($agent);
$agent = $agent['host'];
//check for subdomains
$agent_a = explode(".", $agent);
$agent_r = array_reverse($agent_a);
$sub = count($agent_r) - 1;
$tld3 = substr($agent_r[0], 0, 3);
if (eregi("^(com|net|org|edu|biz|gov)$", $tld3)) //common tld's
{
while ($sub > 0)
{
$domain = $domain.$agent_r[$sub].".";
$sub--;
}
$refurl = $domain.$tld3;
}
$referrer = "<a href='".$refurl."'>".$refurl."</a>";
}
else
{
$referrer = "unknown";
}
Are there any PHP functions that will help here? How to handle sub domains?
International domains?
Thanks in advance.
[Back to original message]
|