|
Posted by Roger Thomas on 03/30/05 05:53
I have a short script to parse my XML file. The parsing produces no error and all output looks good EXCEPT url-links were truncated IF it contain the '&' characters.
My XML file looks like this:
--- start of XML ---
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0">
<channel>
<title>Test News .Net - Newspapers on the Net</title>
<copyright>Small News Network.com</copyright>
<link>http://www.example.com/</link>
<description>Continuously updating Example News.</description>
<language>en-us</language>
<pubDate>Tue, 29 Mar 2005 18:01:01 -0600</pubDate>
<lastBuildDate>Tue, 29 Mar 2005 18:01:01 -0600</lastBuildDate>
<ttl>30</ttl>
<item>
<title>Group buys SunGard for US$10.4bil</title>
<link>http://feeds.example.com/?rid=318045f7e13e0b66&cat=48cba686fe041718&f=1</link>
<description>NEW YORK: A group of seven private equity investment firms agreed yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth US$10.4bil plus debt, making it the biggest lev...</description>
<source url="http://biz.theexample.com/">The Paper</source>
</item>
<item>
<title>Strong quake hits Indonesia coast</title>
<link>http://feeds.example.com/news/world/quake.html</link>
<description>a "widely destructive tsunami" and the quake was felt as far away as Malaysia.</description>
<source url="http://biz.theexample.com.net/">The Paper</source>
</item>
<item>
<title>Final News</title>
<link>http://feeds.example.com/?id=abcdef&cat=somecat</link>
<description>We are going to expect something new this weekend ...</description>
<source url="http://biz.theexample.com/">The Paper</source>
</item>
</channel>
</rss>
--- end of XML ---
For the sake of testing, my script only print out the url-link to those news above. I got these:
f=1
http://feeds.example.com/news/world/quake.html
cat=somecat
The output for line 1 is truncated to 'f=1' and the output of line 3 is truncated to 'cat=somecat'. ie, the script only took the last parameter of the url-link. The output for line 2 is correct since it has NO parameters.
I am not sure what I have done wrong in my script. Is it bcos the RSS spec says that you cannot have parameters in URL ? Please advise.
-- start of script --
<?
$file = "test.xml";
$currentTag = "";
function startElement($parser, $name, $attrs) {
global $currentTag;
$currentTag = $name;
}
function endElement($parser, $name) {
global $currentTag, $TITLE, $URL, $start;
switch ($currentTag) {
case "ITEM":
$start = 0;
case "LINK":
if ($start == 1)
#print "<A HREF = \"".$URL."\">$TITLE</A><BR>";
print "$URL"."<BR>";
break;
}
$currentTag = "";
}
function characterData($parser, $data) {
global $currentTag, $TITLE, $URL, $start;
switch ($currentTag) {
case "ITEM":
$start = 1;
case "TITLE":
$TITLE = $data;
break;
case "LINK":
$URL = $data;
break;
}
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("Cannot locate XML data file: $file");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
-- end of script --
TIA.
Roger
---------------------------------------------------
Sign Up for free Email at http://ureg.home.net.my/
---------------------------------------------------
[Back to original message]
|