PHP and XML parsing

    Date: 02/11/06 (PHP Development)    Keywords: php

    This is something that I've posted on my own LJ, but wanted to post here to see if anyone knows why PHP always checks for character data, twice, for every node.













    Here is the PHP script:

    $file = "./test.xml";
    function xml_parse_from_file($parser, $file){
    if(!file_exists($file)){
    die("Can't find file \"$file\"");
    }
    if(!($fp = @fopen($file, "r"))){
    die("Can't open file \"$file\"");
    }
    while($data = fread($fp, 4096)){
    if(!xml_parse($parser, $data, feof($fp))){
    return false;
    }
    }
    }

    function start_element($parser, $name, $attrs){
    echo "name: " . $name . "
    ";
    $keys = array_keys($attrs);
    for($i = 0; $i < count($attrs); $i ++){
    echo "key: " . $keys[$i] . " attribute: " . $attrs[$keys[$i]] . "
    ";
    }
    }

    function cdata($parser, $data){
    echo "data: " . $data . "
    ";
    }

    function stop_element($parser, $name){
    echo "end element
    " . $name . "

    ";
    }

    $parser = xml_parser_create();
    xml_set_element_handler($parser, "start_element", "stop_element");
    xml_set_character_data_handler($parser, "cdata");
    xml_parse_from_file($parser, $file);
    ?>

    Now, for the result:

    name: BOOK
    data:

    data:
    name: AUTHOR
    key: FNAME attribute: Guy
    key: SURNAME attribute: Kawasaki
    end element
    AUTHOR

    data:

    data:
    name: TITLE
    key: TITLENAME attribute: Rules for Revolutionaries
    end element
    TITLE

    data:

    data:
    name: AUTHOR_BACKGROUND
    data: former chief evangelist at Apple Computer and an iconoclastic corporate tactician who now works with high-tech startups in Silicon Valley
    end element
    AUTHOR_BACKGROUND

    data:

    data:
    name: DESCRIPTION
    data: Guy Kawasaki, former chief evangelist of Apple Computer Inc., and renegade business strategist is back with a 'but-kicking' manifesto, Rules for Revolutionaries
    end element
    DESCRIPTION

    data:

    end element
    BOOK



    As you can see, and the odd thing to me is that for every element, the parser goes into the cdata function twice. The only thing I can think of is that for every each function, start_element and end_element, it goes into the cdata function for possible character data. (I can see it once, but how could there be character data with end_element?) The way around it is simply to change the function cdata to:

    function cdata($parser, $data){
    if(strlen($data) > 0){
    echo "data: " . $data . "
    ";
    }
    }

    This will test to simply make sure that $data does in fact have character data associated with it, but why does it check for every node twice?

    Source: http://community.livejournal.com/php_dev/65402.html

« upload_max_filesize || re-casting objects in PHP »


antivirus | apache | asp | blogging | browser | bugtracking | cms | crm | css | database | ebay | ecommerce | google | hosting | html | java | jsp | linux | microsoft | mysql | offshore | offshoring | oscommerce | php | postgresql | programming | rss | security | seo | shopping | software | spam | spyware | sql | technology | templates | tracker | virus | web | xml | yahoo | home