You are here: Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOMusing PHP? « PHP « IT news, forums, messages
Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOMusing PHP?

Posted by Jon M. on 05/03/05 09:56

(Rasmus wrote:) "If you fwrite UTF-8 data to the file, then it is a UTF-8
file."

Thanks Rasmus! Honestly, that is REALLY helpful!

I was just coming back here to post that I had found that very same answer.
But I am glad to hear it confirmed by the experts.

Bottom line: I was just being silly/ignorant.

I went and downloaded a simple HEX editor and compared the actual binary
output of several files that I had created using both PHP, and my favorite
text editor (emeditor from emeditor.com). I then realized what probably
everyone else here already knew: that (most of the time) the actual binary
output from "Windows 1252" and "ISO-8859-1" and "UTF-8 without the byte
order mark" -are completely identical!

I had the false impression that when a file was saved in UTF-8, that there
was an actual binary "marker" that specified this (e.g. binary marker =
"This file is saved in UTF-8!") -there simply is no such thing. The only
thing that would set "UTF-8" apart -binarilly speaking- is the BOM, and I
had stripped that out, making the file exactly the same as plain old "ANSI"
(since I didn't have any characters that required "UTF-8", like from other
languages etc.).

My text editor displays the current character encoding in the status bar,
but since there was no way for it to tell whether it was saved with Windows
1252 or UTF-8, it just displayed that the file was encoded "windows
default - ISO-8859-1". This is where I got confused.

It turned out that my PHP script has been faithfully saving the file in
UTF-8 the whole time, and everything was fine. I was just not educated
enough about what actually changed when you save a file in UTF-8 but didn't
have any characters that differed from ANSI (which in my case, the "change"
was nothing, since ALL of the characters in my test document where
interchangeable with ANSI).

Well, this has been a learning experience! I hope that this post will help
some poor ignoramus like myself, sometime in the future! :) And hopefully I
am right about what I said above, and not flaunting my ignorance once
again -lol

Thanks again, to everyone who helped me! You guys really got me on the right
track. Not the least of which was simply causing me to think about what I
was asking more deeply.

-Jon


"Rasmus Lerdorf" <rasmus@lerdorf.com> wrote in message
news:4272DF14.3010008@lerdorf.com...
> Jon M. wrote:
>> No matter what I do to the strings to encode them in whatever format
>> before using "fwrite", it ALWAYS seems to end up writing the actual file
>> in "iso-8859-1".
>>
>> Isn't the encoding of the characters in PHP's strings, and the encoding
>> of the actual binary file on your hard drive, two totally different
>> things? Or am I just misinformed?
>
> A file is completely defined by its contents. If you fwrite UTF-8 data to
> the file, then it is a UTF-8 file. Whether your editor, or whatever it is
> you are using to determine the file is being written as iso-8859-1 is
> smart enough to pick this up is a completely different question.
>
> Why don't you try creating the same contents with PHP and with your
> preferred text editor and then compare the contents. Perhaps your editor
> is dropping a hint somewhere in it that you are not writing to the file
> from PHP.
>
> -Rasmus

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация