Reply to Re: Unicode characters in e-mail / database

Your name:

Reply:


Posted by Willem Bogaerts on 06/13/07 10:19

> I'm working on a website for a client at the moment where they get
> form submissions from all over the world - while most are in English,
> occasionally they'll get one in Chinese or Russian. Or occasionally
> Greek, just so my example below is justified :)
>
> I need to store this info AS TYPED in the database and echo it AS
> TYPED in the e-mail, e.g. if someone types in "Αυτά είναι ελληνικά"
> that's how it needs to be stored and echoed.

That is a problem, if you do not know what encoding it is typed in. So
you will have to determine the encoding. For example, utf-8 bytes are
perfectly valid as iso-8859-1 bytes, but look very different.

> The mySQL database is set up for UTF-8, although the client's
> webserver is serving the page with the form on as ISO-8859-1 - does
> this need to change?

I wonder how anyone can enter russian in a form that only supports
iso-8859-1. BUT, even if the website is served in iso-8859-1 (are you
really sure?), you can give an "accept-charset" attribute in the form
element in HTML. So no, it does not really have to change. If you want
to render all languages on that site though, I would recommend it.

> When stored in the database / echoed into an e-mail the above string
> comes out as a complete mess: " Αυτά είναι ελληνικά
> Αυτά είναι ελληνικά"

What encoding is the text? And what encoding does the server expect?

> So how do I ensure the typed characters, be they Chinese, Russian,
> Greek or even just accented or with umlauts are maintained into the
> database and out into an e-mail? All help appreciated, I've spent ages
> researching this problem and run into rather a few brick walls.
> Thanks!

In e-mail (just as with a web page), you just state the encoding with
the content-type (for instance, Content-Type: text/plain; charset=utf-8)

The real problem with encodings is that there is a difference between a
text and a string. A string is just a chain of bytes, whereas a text is
a chain of bytes with an encoding. Every program or system I know of
stores texts as strings, so this means that you will have to track the
encoding used in a separate fashion. By far the easiest way to go is to
dictate the preferred encoding to the browser and the database.

If you want to set MySQL to use utf-8, start a connection with "SET
NAMES utf8;". If you want to talk utf-8 with a browser, use a
Content-Type header or configure this in PHP.INI. To support multiple
PHP servers, you can query the current charset from the PHP.INI file
using the ini_get function (beware that iso-8859-1 is used when this
setting is empty).

Hope this helps.
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация