Reply to Re: Regular expression: non-latin word/non-word characters and UTF-8 — All PHP

Posted by steve on 06/14/68 11:27

i suppose your *other* post didn't supply you with an agreeable answer and
that *re-posting* the same question will. well, guess what? i doubt
re-posting it will find you an any more agreeable donor.

....but that's just me thinking out loud.

"Markus Ernst" <derernst@NO#SP#AMgmx.ch> wrote in message
news:4332921a$1_2@news.cybercity.ch...
| Hi
|
| I wrote a function that "normalizes" strings for use in URLs in a UTF-8
| encoded content administration application. After having removed the
accents
| from latin characters I try to remove all non-word characters from the
| string:
|
| // PCRE syntax:
| $string = preg_replace("/([\W]+)/", "-", $string);
|
| // POSIX alternative (mb_string is on):
| $string = ereg_replace("[^[:alnum:]]+", "-", $string);
|
| // post-process and return
| return urlencode(trim($string, "-"));
|
| Both ways work but remove all non-latin characters. But what I want to do
is
| remove only the non-word characters of whatever languages, and keep all
word
| characters regardless if they are Japanese, Hebrew, Arab, Latin or
whatever.
|
| Is there a way for a Regex to recognize non-latin word/non-word
characters?
| Or do I have to manually specify all the characters to be removed?
|
| Thanks for every hint
| Markus
|
|

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация