|
Posted by OmegaJunior on 01/21/07 11:02
On Thu, 18 Jan 2007 14:08:01 +0100, Steven Mocking
<ufo@quicknet.youmightwanttogetridofthis.nl> wrote:
> The following code produces XML which is not valid UTF-8 according to
> xmllint. For convenient testing and confidentiality issues I've
> simplified it considerably, but normally it should encode several megs
> of text from a MySQL table.
>
> <?
> $xml = new XmlWriter();
> $xml->openMemory();
> $xml->setIndent(true);
> // $xml->startDocument('1.0','ISO-8859-1');
> $xml->startDocument('1.0','UTF-8');
> $xml->startElement('dataroot');
> $xml->writeAttribute('xmlns:od',
> 'urn:schemas-microsoft-com:officedata');
>
> $xml->startElement('VACATURE');
> $xml->startElement("bla");
> $xml->text("commerciële");
> $xml->endElement(); // </bla>
> $xml->endElement(); // </VACATURE>
> $xml->endElement();
> $xml->endDocument();
> header("Content-type: application/xml");
> print $xml->outputMemory(true);
> ?>
>
> That's because it's ISO-8859-1. If I manually change this attribute in
> the output, the XML validates.
>
> Changing the encoding argument to startDocument in the script results in
> a conv error:
>
> Warning: XMLWriter::outputMemory() function.XMLWriter-outputMemory:
> output conversion failed due to conv error, bytes 0xEB 0x6C 0x65 0x3C in
> broken-xml-output.txt on line 18
>
> Doesn't make any sense to me, because it doesn't even need to convert
> the string. Same trouble on two machines with PHP 5.1.2 and 5.2.0. I
> could always replace it in the output with a regexp on the first line,
> but that's just plain Bad and Wrong.
>
> Steven
Having no experience with XMLWriter, I thought of these questions:
- maybe the writer does try to convert the input text into whatever
charset you supply?
- maybe it tries to save the input text as whatever charset you supply?
- the e-umlaut may be part of the iso-8859-1, but is it part of utf-8,
too? Or would you need a unicode number?
Hope this helps!
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
[Back to original message]
|