|
Posted by Kimmo Laine on 08/30/06 10:04
"Peter Mόnster" <look@signature.invalid> wrote in message
news:Pine.LNX.4.64.0608300806240.28934@gaston.deltadore.bzh...
> Hello,
>
> str_word_count() does not seem to work with locale "fr_FR.utf8".
> The output of the following script is
> string(10) "fr_FR.utf8" Array ( [0] => bi [1] => re )
>
> I think, that "biθre" should be recognized as word.
>
> Here is the test-script:
>
> <?
> echo '<html><head>
> <meta http-equiv="content-type" content="text/html; charset=utf-8" />
> </head><body>';
> var_dump(setlocale(LC_ALL, 'fr_FR.utf8'));
> print_r(str_word_count('biθre', 1));
> echo '</body></html>';
> ?>
>
> Could someone help please?
> My PHP version is 5.1.2.
That might be a multibyte-string related problem. If the string is encoded
using multibyte charset, such as utf-8, it could be the reason
str_word_count is confused. PHP has a library for multibyte-functionality
designed to overcome the problems created by multibyte-encoded strings.
See:
http://fi2.php.net/manual/en/ref.mbstring.php
Once you've installed multibyte library, you could try writing a regular
expression for counting the words and use it with the mb_ereg* functions.
It's very sad that handling multibyte strings is not as easy as it would be
with simple english charset, but on the bright side, at least there is some
sort of support for it with the multibyte function library.
--
"Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
http://outolempi.net/ahdistus/ - Satunnaisesti pδivittyvδ nettisarjis
spam@outolempi.net || Gedoon-S @ IRCnet || rot13(xvzzb@bhgbyrzcv.arg)
Navigation:
[Reply to this message]
|