Re: breaking up an UTF8 string — All PHP — IT news, forums, messages

You are here: Re: breaking up an UTF8 string « All PHP « IT news, forums, messages

Posted by Markus on 07/20/07 06:50

Mathias K. schrieb:
> Hello!
>
> Could anyone give me a hint how to split up an utf8 string character-wise
> with PHP?
>
> In my case it's a Japanese string. As the lengths of each Japanese
> character can differ from 2 to 6 bytes i don't know how to find out where
> one character begins and the other ends.
>
> I tried splitting it up with mb_split:
>
> $departed = implode('<br>', mb_split("\w", $word));
>
> Well it doesn't seem to work. The Japanese character totally get messed up.
>
> Does anyone have a clue what regex to use or how else i could split a
> Japanese string character wise?

I am not too familiar with Japanese and mbstring - of course you made
sure proper encodings are set? See mb_internal_encoding(),
mb_regex_encoding().

Also, I think that mb_split() removes the delimiter, which is a word
character - should it not rather be mb_split("", $word)?

Thinking of alternative methods, you can try something like:
$chars = array();
for ($i=0; $i<mb_strlen($word); $i++) {
$chars[] = mb_substr($word, $i, 1);
}
implode('<br>', $chars);

Finally, if there is a problem with the mbstring functions, you can try
the PEAR I18N_UnicodeString class:
http://pear.php.net/package/I18N_UnicodeString

It is very handy for converting a UTF-8 string into an array of the
decimal Unicode representations:

require_once('I18N_UnicodeString.php');
$numbers = I18N_UnicodeString::utf8ToUnicode($word);
$chars = array();
foreach ($numbers as $nr) {
$chars[] = I18N_UnicodeString::unicodeCharToUtf8($nr);
}
implode('<br>', $chars);

(All examples are not tested.)

HTH
Markus

Navigation:

Next in forum: PHP and memory use
Prev in forum: Re: Foreign characters behaving oddly
Thread view: Re: breaking up an UTF8 string

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация