PHP and XML parsing
Date: 02/11/06
(PHP Development) Keywords: php
This is something that I've posted on my own LJ, but wanted to post here to see if anyone knows why PHP always checks for character data, twice, for every node.
Here is the PHP script:
$file = "./test.xml";
function xml_parse_from_file($parser, $file){
if(!file_exists($file)){
die("Can't find file \"$file\"");
}
if(!($fp = @fopen($file, "r"))){
die("Can't open file \"$file\"");
}
while($data = fread($fp, 4096)){
if(!xml_parse($parser, $data, feof($fp))){
return false;
}
}
}
function start_element($parser, $name, $attrs){
echo "name: " . $name . "
";
$keys = array_keys($attrs);
for($i = 0; $i < count($attrs); $i ++){
echo "key: " . $keys[$i] . " attribute: " . $attrs[$keys[$i]] . "
";
}
}
function cdata($parser, $data){
echo "data: " . $data . "
";
}
function stop_element($parser, $name){
echo "end element " . $name . "
";
}
$parser = xml_parser_create();
xml_set_element_handler($parser, "start_element", "stop_element");
xml_set_character_data_handler($parser, "cdata");
xml_parse_from_file($parser, $file);
?>
Now, for the result:
name: BOOK
data:
data:
name: AUTHOR
key: FNAME attribute: Guy
key: SURNAME attribute: Kawasaki
end element AUTHOR
data:
data:
name: TITLE
key: TITLENAME attribute: Rules for Revolutionaries
end element TITLE
data:
data:
name: AUTHOR_BACKGROUND
data: former chief evangelist at Apple Computer and an iconoclastic corporate tactician who now works with high-tech startups in Silicon Valley
end element AUTHOR_BACKGROUND
data:
data:
name: DESCRIPTION
data: Guy Kawasaki, former chief evangelist of Apple Computer Inc., and renegade business strategist is back with a 'but-kicking' manifesto, Rules for Revolutionaries
end element DESCRIPTION
data:
end element BOOK
As you can see, and the odd thing to me is that for every element, the parser goes into the cdata function twice. The only thing I can think of is that for every each function, start_element and end_element, it goes into the cdata function for possible character data. (I can see it once, but how could there be character data with end_element?) The way around it is simply to change the function cdata to:
function cdata($parser, $data){
if(strlen($data) > 0){
echo "data: " . $data . "
";
}
}
This will test to simply make sure that $data does in fact have character data associated with it, but why does it check for every node twice?
Source: http://community.livejournal.com/php_dev/65402.html