You are here: Re: Extracting body from HTML document? « PHP Programming Language « IT news, forums, messages
Re: Extracting body from HTML document?

Posted by Andre-John Mas on 11/14/07 21:47

On Nov 14, 4:23 pm, Andre-John Mas <andrejohn....@gmail.com> wrote:
> Hi,
>
> I am wanting to be able to get a section of a HTML document, by
> specifying an XPath. For example:
>
> $title= GetSection ( '/html/head/title');
> $body= GetSection ( '/html/body');
>
> I made a simple parser myself some time back, but it is failing with
> certain types of documents. Instead of maintaining the code, I would
> reather find an existing solution, so that I can concentrate my
> development efforts elswhere. Does anyone have anything they can
> recommend?
>
> Andre

My current implementation is very basic. The main issue I am having is
that if there are any attributes associated with the start element,
then nothing is returned. While I can eventually solve this, I would
rather use a robust API, since there are certainly other issues I
might run into.

function GetElementByName ($xml, $start, $end) {
$startpos = strpos($xml, $start);
if ($startpos === false) {
return false;
}
$endpos = strpos($xml, $end);
$endpos = $endpos+strlen($end);
$endpos = $endpos-$startpos;
$endpos = $endpos - strlen($end);
$tag = substr ($xml, $startpos, $endpos);
$tag = substr ($tag, strlen($start));

return $tag;
}

function XPathValue($XPath,$XML) {
$XPathArray = explode("/",$XPath);

$node = $XML;
while (list($key,$value) = each($XPathArray)) {
$node = GetElementByName($node, "<$value>", "</$value>");
}

return $node;
}

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация