|
Posted by Mathias K. on 07/21/07 06:22
Am Fri, 20 Jul 2007 08:50:19 +0200 schrieb Markus:
> Mathias K. schrieb:
>> Hello!
>>
>> Could anyone give me a hint how to split up an utf8 string character-wise
>> with PHP?
>>
>> In my case it's a Japanese string. As the lengths of each Japanese
>> character can differ from 2 to 6 bytes i don't know how to find out where
>> one character begins and the other ends.
>>
>> I tried splitting it up with mb_split:
>>
>> $departed = implode('<br>', mb_split("\w", $word));
>>
>> Well it doesn't seem to work. The Japanese character totally get messed up.
>>
>> Does anyone have a clue what regex to use or how else i could split a
>> Japanese string character wise?
>
> I am not too familiar with Japanese and mbstring - of course you made
> sure proper encodings are set? See mb_internal_encoding(),
> mb_regex_encoding().
>
> Also, I think that mb_split() removes the delimiter, which is a word
> character - should it not rather be mb_split("", $word)?
>
> Thinking of alternative methods, you can try something like:
> $chars = array();
> for ($i=0; $i<mb_strlen($word); $i++) {
> $chars[] = mb_substr($word, $i, 1);
> }
> implode('<br>', $chars);
>
> Finally, if there is a problem with the mbstring functions, you can try
> the PEAR I18N_UnicodeString class:
> http://pear.php.net/package/I18N_UnicodeString
>
> It is very handy for converting a UTF-8 string into an array of the
> decimal Unicode representations:
>
> require_once('I18N_UnicodeString.php');
> $numbers = I18N_UnicodeString::utf8ToUnicode($word);
> $chars = array();
> foreach ($numbers as $nr) {
> $chars[] = I18N_UnicodeString::unicodeCharToUtf8($nr);
> }
> implode('<br>', $chars);
>
> (All examples are not tested.)
>
> HTH
> Markus
Thanks!
~ Mathias
Navigation:
[Reply to this message]
|