|
Posted by Steven Mocking on 01/18/07 13:08
The following code produces XML which is not valid UTF-8 according to
xmllint. For convenient testing and confidentiality issues I've
simplified it considerably, but normally it should encode several megs
of text from a MySQL table.
<?
$xml = new XmlWriter();
$xml->openMemory();
$xml->setIndent(true);
// $xml->startDocument('1.0','ISO-8859-1');
$xml->startDocument('1.0','UTF-8');
$xml->startElement('dataroot');
$xml->writeAttribute('xmlns:od', 'urn:schemas-microsoft-com:officedata');
$xml->startElement('VACATURE');
$xml->startElement("bla");
$xml->text("commerciële");
$xml->endElement(); // </bla>
$xml->endElement(); // </VACATURE>
$xml->endElement();
$xml->endDocument();
header("Content-type: application/xml");
print $xml->outputMemory(true);
?>
That's because it's ISO-8859-1. If I manually change this attribute in
the output, the XML validates.
Changing the encoding argument to startDocument in the script results in
a conv error:
Warning: XMLWriter::outputMemory() function.XMLWriter-outputMemory:
output conversion failed due to conv error, bytes 0xEB 0x6C 0x65 0x3C in
broken-xml-output.txt on line 18
Doesn't make any sense to me, because it doesn't even need to convert
the string. Same trouble on two machines with PHP 5.1.2 and 5.2.0. I
could always replace it in the output with a regexp on the first line,
but that's just plain Bad and Wrong.
Steven
[Back to original message]
|