|
Posted by amygdala on 06/14/07 01:39
"Andy Hassall" <andy@andyh.co.uk> schreef in bericht
news:nbn073pqso001bsrhpkmpcqvnjoulkjjlb@4ax.com...
> On Wed, 13 Jun 2007 22:25:44 +0200, "amygdala" <noreply@noreply.com>
> wrote:
>
>>I'm trying to let PHP write a 'sitemap.xml' sitemap for Google and other
>>searchengines. It's working, except that the content in the XML file
>>doesn't
>>seem to be UTF8. (Which it should be, judging by the information given on
>>Google's webmaster helpcenter).
>>
>>The way I test to see if the content is UTF8, is by opening the XML file
>>in
>>notepad and choose 'save as...'. Normally the coding option should be set
>>to
>>UTF8, but now it just shows ANSI.
>
> Well, that's not a foolproof method...
I was afraid of that.
>>This is what I have tried to write UTF8 content with:
>>
>>file_put_contents( '.' . SITEMAP_FILE, utf8_encode(
>>$this->sitemapForCrawlers ) );
>>...and...
>>file_put_contents( '.' . SITEMAP_FILE, iconv( "ISO-8859-1", "UTF8",
>>$this->sitemapForCrawlers ) );
>>
>>...where...
>>SITEMAP_FILE is the filename constant
>>...and...
>>$this->sitemapForCrawlers is the string with XML data
>>
>>With the last attempt I even got an error saying:
>>
>>Wrong charset, conversion from `ISO-8859-1' to `UTF8' is not allowed in...
>>
>>Any adeas of how I can make this work?
>
> Start from the beginning; what character set encoding is the original data
> in?
> The error implies that it's not ISO-8859-1 (which does have some gaps
> where
> characters aren't valid...)
Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
editor I use to code PHP. And it tells me my PHP code files are encoded in
'1252 (ANSI - Latin I)'. So, now my next question is... what would be the
correct first parameter for the iconv function to tell it that the original
data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:
'1252 (ANSI - Latin I)'
'1252'
'1252 ANSI'
'1252-ANSI'
'ANSI-1252'
'ANSI 1252'
....and variations.
Is there any iconv encoding table with acceptable encodings I can consult?
Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?
Although I'm still curious of this. Please read my reply to C. also.
Thanks.
[Back to original message]
|