Reply to Re: Regular expression: non-latin word/non-word characters and UTF-8

Your name:

Reply:


Posted by steve on 10/13/68 11:27

i suppose your *other* post didn't supply you with an agreeable answer and
that *re-posting* the same question will. well, guess what? i doubt
re-posting it will find you an any more agreeable donor.

....but that's just me thinking out loud.


"Markus Ernst" <derernst@NO#SP#AMgmx.ch> wrote in message
news:4332921a$1_2@news.cybercity.ch...
| Hi
|
| I wrote a function that "normalizes" strings for use in URLs in a UTF-8
| encoded content administration application. After having removed the
accents
| from latin characters I try to remove all non-word characters from the
| string:
|
| // PCRE syntax:
| $string = preg_replace("/([\W]+)/", "-", $string);
|
| // POSIX alternative (mb_string is on):
| $string = ereg_replace("[^[:alnum:]]+", "-", $string);
|
| // post-process and return
| return urlencode(trim($string, "-"));
|
| Both ways work but remove all non-latin characters. But what I want to do
is
| remove only the non-word characters of whatever languages, and keep all
word
| characters regardless if they are Japanese, Hebrew, Arab, Latin or
whatever.
|
| Is there a way for a Regex to recognize non-latin word/non-word
characters?
| Or do I have to manually specify all the characters to be removed?
|
| Thanks for every hint
| Markus
|
|

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация