|
Posted by Malcolm Dew-Jones on 06/09/06 20:04
Radium (uh5d@rz.uni-karlsruhe.de) wrote:
: Hi,
: what i want is something similar to th simple-xml extension of php, but for
: html.
: I have to analyze and read in certain tags from a html file in a comfortable
: manner.
: Is there a php extension/library which makes this possible?
In php, not that I know off though I would like to be wrong.
If you know any perl then use the excellent HTML::Parser. It handles just
about anything that a web site might throw at it. You could use the perl
script to build a PHP script
Assume text input something like
<html><head><title>example page</title> (etc)
So write a perl script with handlers something like (totally pseudo code)
sub do_start_tag
{
my $tag_name = this is available in the parser, but I forget how
print TMP_PHP_SCRIPT , "handle_tag('$tag_name');\n";
}
sub do_text
{
my $raw_text = this is available in the parser, but I forget how
my $safe_text = quotemeta($raw_text);
print TMP_PHP_SCRIPT , "handle_text('$safe_text');\n";
}
sub do_end_tag
{
my $tag_name = this is available in the parser, but I forget how
print TMP_PHP_SCRIPT , "handle_end_tag('$tag_name');\n";
}
From that you would get a temporary files with lines like
handle_tag('html');
handle_tag('head');
handle_tag('title');
handle_text( 'example page');
handle_end_tag('title');
handle_end_tag('head');
Your main php script would run the perl script, and then run the temporary
php script (example shown just above), and your php functions like
handle_tag etc would be called just as if you had been able to parse the
data directly from within php.
$0.10
[Back to original message]
|