|
Posted by Markus on 07/20/07 06:50
Mathias K. schrieb:
> Hello!
>
> Could anyone give me a hint how to split up an utf8 string character-wise
> with PHP?
>
> In my case it's a Japanese string. As the lengths of each Japanese
> character can differ from 2 to 6 bytes i don't know how to find out where
> one character begins and the other ends.
>
> I tried splitting it up with mb_split:
>
> $departed = implode('<br>', mb_split("\w", $word));
>
> Well it doesn't seem to work. The Japanese character totally get messed up.
>
> Does anyone have a clue what regex to use or how else i could split a
> Japanese string character wise?
I am not too familiar with Japanese and mbstring - of course you made
sure proper encodings are set? See mb_internal_encoding(),
mb_regex_encoding().
Also, I think that mb_split() removes the delimiter, which is a word
character - should it not rather be mb_split("", $word)?
Thinking of alternative methods, you can try something like:
$chars = array();
for ($i=0; $i<mb_strlen($word); $i++) {
$chars[] = mb_substr($word, $i, 1);
}
implode('<br>', $chars);
Finally, if there is a problem with the mbstring functions, you can try
the PEAR I18N_UnicodeString class:
http://pear.php.net/package/I18N_UnicodeString
It is very handy for converting a UTF-8 string into an array of the
decimal Unicode representations:
require_once('I18N_UnicodeString.php');
$numbers = I18N_UnicodeString::utf8ToUnicode($word);
$chars = array();
foreach ($numbers as $nr) {
$chars[] = I18N_UnicodeString::unicodeCharToUtf8($nr);
}
implode('<br>', $chars);
(All examples are not tested.)
HTH
Markus
Navigation:
[Reply to this message]
|