|
Posted by Andy Hassall on 06/14/07 20:37
On Thu, 14 Jun 2007 03:39:34 +0200, "amygdala" <noreply@noreply.com> wrote:
>> Start from the beginning; what character set encoding is the original data
>> in?
>> The error implies that it's not ISO-8859-1 (which does have some gaps
>> where
>> characters aren't valid...)
>
>Well... I discovered the 'Set Code Page...' option in UltraEdit, the main
>editor I use to code PHP. And it tells me my PHP code files are encoded in
>'1252 (ANSI - Latin I)'.
Well... again, that's not foolproof. It's generally not possible to
definitively detect the encoding of a file. You can work out whether it's
impossible to be in a particular encoding (invalid characters or byte
sequences), and you can make some guesses on character distribution or
spellings of words, but unless it's tagged in some way (like HTML and XML, or
through another channel like HTTP headers) then it's not certain.
"Windows Codepage 1252" is a Windows character set encoding that is similar,
but not exactly the same as ISO-8859-1. It (1252) differs on the location of
the Euro character, and has a few extra characters in a range that is reserved
in ISO-8859-1.
Do you have any Euro currency symbols in the file?
>So, now my next question is... what would be the
>correct first parameter for the iconv function to tell it that the original
>data is '1252 (ANSI - Latin I)'. I've tried numerous stings, which include:
>
>'1252 (ANSI - Latin I)'
>'1252'
>'1252 ANSI'
>'1252-ANSI'
>'ANSI-1252'
>'ANSI 1252'
>
>...and variations.
>
>Is there any iconv encoding table with acceptable encodings I can consult?
http://www.gnu.org/software/libiconv/
You possibly want:
CP1252
>Also, isn't '1252 (ANSI - Latin I)' just a pimped version of ISO-8859-1?
I should read the entire message before typing ;-)
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Navigation:
[Reply to this message]
|