|
Posted by cc on 10/13/05 14:53
maybe i should have said: è is not an _xml_ entity.
i m not very sure.
sorry.
`è' is an html entity,
represents the letter `è' in iso-8859-1 charset,
which have ascii value of 0xe8 .
to have it recognized by libxml, there are 3 ways to do this:
1, <?xml version="1.0"><item_name>&#e8;</item_name>
2, <?xml version="1.0" encoding="iso-8859-1"><item_name>è</item_name>
3, <?xml version="1.0"><item_name>è</item_name>
1 can be saved using either utf-8 encoding or iso-8859-1 encoding;
2 must be saved using iso-8859-1 encoding
3 must be saved using utf-8 encoding ( to have `è' be converted properly)
in php, we can do this:
$html = html_entity_decode('<item_name>farm lettuces with reed
avocado, crème
fraîche, radish and cilantro</item_name>');
$dom = DomDocument::loadXML("<?xml version=\"1.0\"
encoding=\"iso-8859-1\">$html");
On 10/13/05, Marcus Bointon <marcus@synchromedia.co.uk> wrote:
> On 13 Oct 2005, at 07:24, cc wrote:
>
> > both `è' and `î' are not entities in charset utf-8, use
> > `&egrave;' and `&icirc;' instead.
>
> I would expect that to result in unconverted entities in the output.
> If you're intending to send that content as HTML, then I guess that
> would be OK. However, if you're using UTF-8 anyway, why not just use
> the real characters?
>
> Marcus
> --
> Marcus Bointon
> Synchromedia Limited: Putting you in the picture
> marcus@synchromedia.co.uk | http://www.synchromedia.co.uk
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
[Back to original message]
|