|
Posted by Roger Thomas on 03/31/05 02:35
Hi Marek,
Thank you for the solution.
--
Roger
Quoting Marek Kilimajer <lists@kilimajer.net>:
> That's because the character data is split on the borders of the
> entities, so for
>
> http://feeds.example.com/?rid=318045f7e13e0b66&cat=48cba686fe041718&f=1
>
> characterData() will be called 5 times:
>
> http://feeds.example.com/?rid=318045f7e13e0b66
> &
> cat=48cba686fe041718
> &
> f=1
>
> Solution is inlined below
>
> Roger Thomas wrote:
> > I have a short script to parse my XML file. The parsing produces no error
> and all output looks good EXCEPT url-links were truncated IF it contain the
> '&' characters.
> >
> > My XML file looks like this:
> > --- start of XML ---
> > <?xml version="1.0" encoding="iso-8859-1"?>
> > <rss version="2.0">
> > <channel>
> > <title>Test News .Net - Newspapers on the Net</title>
> > <copyright>Small News Network.com</copyright>
> > <link>http://www.example.com/</link>
> > <description>Continuously updating Example News.</description>
> > <language>en-us</language>
> > <pubDate>Tue, 29 Mar 2005 18:01:01 -0600</pubDate>
> > <lastBuildDate>Tue, 29 Mar 2005 18:01:01 -0600</lastBuildDate>
> > <ttl>30</ttl>
> > <item>
> > <title>Group buys SunGard for US$10.4bil</title>
> >
> <link>http://feeds.example.com/?rid=318045f7e13e0b66&cat=48cba686fe041718&f=1</link>
> > <description>NEW YORK: A group of seven private equity investment firms
> agreed yesterday to buy financial technology company SunGard Data Systems Inc
> in a deal worth US$10.4bil plus debt, making it the biggest
> lev...</description>
> > <source url="http://biz.theexample.com/">The Paper</source>
> > </item>
> > <item>
> > <title>Strong quake hits Indonesia coast</title>
> > <link>http://feeds.example.com/news/world/quake.html</link>
> > <description>a "widely destructive tsunami" and the quake was
> felt as far away as Malaysia.</description>
> > <source url="http://biz.theexample.com.net/">The Paper</source>
> > </item>
> > <item>
> > <title>Final News</title>
> > <link>http://feeds.example.com/?id=abcdef&cat=somecat</link>
> > <description>We are going to expect something new this weekend
> ...</description>
> > <source url="http://biz.theexample.com/">The Paper</source>
> > </item>
> > </channel>
> > </rss>
> > --- end of XML ---
> >
> > For the sake of testing, my script only print out the url-link to those
> news above. I got these:
> > f=1
> > http://feeds.example.com/news/world/quake.html
> > cat=somecat
> >
> > The output for line 1 is truncated to 'f=1' and the output of line 3 is
> truncated to 'cat=somecat'. ie, the script only took the last parameter of
> the url-link. The output for line 2 is correct since it has NO parameters.
> >
> > I am not sure what I have done wrong in my script. Is it bcos the RSS spec
> says that you cannot have parameters in URL ? Please advise.
> >
> > -- start of script --
> > <?
> > $file = "test.xml";
> > $currentTag = "";
> >
> > function startElement($parser, $name, $attrs) {
> > global $currentTag;
> > $currentTag = $name;
> > }
> >
> > function endElement($parser, $name) {
> > global $currentTag, $TITLE, $URL, $start;
> >
> > switch ($currentTag) {
> > case "ITEM":
> > $start = 0;
> > case "LINK":
> > if ($start == 1)
> > #print "<A HREF = \"".$URL."\">$TITLE</A><BR>";
> > print "$URL"."<BR>";
> > break;
> > }
> > $currentTag = "";
>
> // Reset also other variables:
> $URL = '';
> $TITLE = '';
>
> > }
> >
> > function characterData($parser, $data) {
> > global $currentTag, $TITLE, $URL, $start;
> >
> > switch ($currentTag) {
> > case "ITEM":
> > $start = 1;
> > case "TITLE":
> > $TITLE = $data;
>
> // append instead:
> $TITLE .= $data;
>
> > break;
> > case "LINK":
> > $URL = $data;
>
> // append instead:
> $URL .= $data;
>
> // Warning: entities are decoded at this point, you will receive &, not
> &
>
> > break;
> > }
> > }
> >
> > $xml_parser = xml_parser_create();
> > xml_set_element_handler($xml_parser, "startElement", "endElement");
> > xml_set_character_data_handler($xml_parser, "characterData");
> >
> > if (!($fp = fopen($file, "r"))) {
> > die("Cannot locate XML data file: $file");
> > }
> >
> > while ($data = fread($fp, 4096)) {
> > if (!xml_parse($xml_parser, $data, feof($fp))) {
> > die(sprintf("XML error: %s at line %d",
> > xml_error_string(xml_get_error_code($xml_parser)),
> > xml_get_current_line_number($xml_parser)));
> > }
> > }
> >
> > xml_parser_free($xml_parser);
> >
> > ?>
> > -- end of script --
> >
> > TIA.
> > Roger
> >
> >
> > ---------------------------------------------------
> > Sign Up for free Email at http://ureg.home.net.my/
> > ---------------------------------------------------
> >
>
>
---------------------------------------------------
Sign Up for free Email at http://ureg.home.net.my/
---------------------------------------------------
Navigation:
[Reply to this message]
|