| 
	
 | 
 Posted by Kimmo Laine on 08/30/06 10:04 
"Peter Mόnster" <look@signature.invalid> wrote in message  
news:Pine.LNX.4.64.0608300806240.28934@gaston.deltadore.bzh... 
> Hello, 
> 
> str_word_count() does not seem to work with locale "fr_FR.utf8". 
> The output of the following script is 
> string(10) "fr_FR.utf8" Array ( [0] => bi [1] => re ) 
> 
> I think, that "biθre" should be recognized as word. 
> 
> Here is the test-script: 
> 
> <? 
> echo '<html><head> 
> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> 
> </head><body>'; 
> var_dump(setlocale(LC_ALL, 'fr_FR.utf8')); 
> print_r(str_word_count('biθre', 1)); 
> echo '</body></html>'; 
> ?> 
> 
> Could someone help please? 
> My PHP version is 5.1.2. 
 
 
That might be a multibyte-string related problem. If the string is encoded  
using multibyte charset, such as utf-8, it could be the reason  
str_word_count is confused. PHP has a library for multibyte-functionality  
designed to overcome the problems created by multibyte-encoded strings. 
See: 
http://fi2.php.net/manual/en/ref.mbstring.php 
 
Once you've installed multibyte library, you could try writing a regular  
expression for counting the words and use it with the mb_ereg* functions. 
 
It's very sad that handling multibyte strings is not as easy as it would be  
with simple english charset, but on the bright side, at least there is some  
sort of support for it with the multibyte function library. 
 
--  
"Ohjelmoija  on  organismi  joka  muuttaa  kofeiinia  koodiksi" - lpk 
http://outolempi.net/ahdistus/ - Satunnaisesti pδivittyvδ nettisarjis 
spam@outolempi.net || Gedoon-S @ IRCnet || rot13(xvzzb@bhgbyrzcv.arg)
 
  
Navigation:
[Reply to this message] 
 |