|
Posted by Willem Bogaerts on 06/13/07 10:19
> I'm working on a website for a client at the moment where they get
> form submissions from all over the world - while most are in English,
> occasionally they'll get one in Chinese or Russian. Or occasionally
> Greek, just so my example below is justified :)
>
> I need to store this info AS TYPED in the database and echo it AS
> TYPED in the e-mail, e.g. if someone types in "Αυτά είναι ελληνικά"
> that's how it needs to be stored and echoed.
That is a problem, if you do not know what encoding it is typed in. So
you will have to determine the encoding. For example, utf-8 bytes are
perfectly valid as iso-8859-1 bytes, but look very different.
> The mySQL database is set up for UTF-8, although the client's
> webserver is serving the page with the form on as ISO-8859-1 - does
> this need to change?
I wonder how anyone can enter russian in a form that only supports
iso-8859-1. BUT, even if the website is served in iso-8859-1 (are you
really sure?), you can give an "accept-charset" attribute in the form
element in HTML. So no, it does not really have to change. If you want
to render all languages on that site though, I would recommend it.
> When stored in the database / echoed into an e-mail the above string
> comes out as a complete mess: " Αυτά είναι ελληνικά
> Αυτά είναι ελληνικά"
What encoding is the text? And what encoding does the server expect?
> So how do I ensure the typed characters, be they Chinese, Russian,
> Greek or even just accented or with umlauts are maintained into the
> database and out into an e-mail? All help appreciated, I've spent ages
> researching this problem and run into rather a few brick walls.
> Thanks!
In e-mail (just as with a web page), you just state the encoding with
the content-type (for instance, Content-Type: text/plain; charset=utf-8)
The real problem with encodings is that there is a difference between a
text and a string. A string is just a chain of bytes, whereas a text is
a chain of bytes with an encoding. Every program or system I know of
stores texts as strings, so this means that you will have to track the
encoding used in a separate fashion. By far the easiest way to go is to
dictate the preferred encoding to the browser and the database.
If you want to set MySQL to use utf-8, start a connection with "SET
NAMES utf8;". If you want to talk utf-8 with a browser, use a
Content-Type header or configure this in PHP.INI. To support multiple
PHP servers, you can query the current charset from the PHP.INI file
using the ini_get function (beware that iso-8859-1 is used when this
setting is empty).
Hope this helps.
--
Willem Bogaerts
Application smith
Kratz B.V.
http://www.kratz.nl/
[Back to original message]
|