String Validation With UTF-8 Support — PHP Programming Language

You are here: String Validation With UTF-8 Support « PHP Programming Language « IT news, forums, messages

Posted by Samuel on 10/07/05 00:11

Hello,

I am looking for a way to check whether a string contains only word
characters and a single space (!= any whitespace char), *regardless of
the current locale*. In other words, any character that is a word
character in any locale should be allowed. This check:

preg_match("/^[\w ]*$/", $_GET[whatever]);

in which the $_GET variable contains an UTF-8 encoded string, only
seems to work with whatever locale is currently defined. Of course, I
could change the locale using setlocale(), but that would still limit
the check to a subset of all possible input values.

I also created this function from information that I found on the web:

--------------------------------
function is_utf8($_string) {
return preg_match('/^([\x00-\x7f]|'
. '[\xc2-\xdf][\x80-\xbf]|'
. '\xe0[\xa0-\xbf][\x80-\xbf]|'
. '[\xe1-\xec][\x80-\xbf]{2}|'
. '\xed[\x80-\x9f][\x80-\xbf]|'
. '[\xee-\xef][\x80-\xbf]{2}|'
. 'f0[\x90-\xbf][\x80-\xbf]{2}|'
. '[\xf1-\xf3][\x80-\xbf]{3}|'
. '\xf4[\x80-\x8f][\x80-\xbf]{2})*$/',
$_string) > 0;
}
--------------------------------

However, this does not seem to be completely accurate, as it still
allows characters such as this:

http://debain.org/software/tefinch/demo/?read=1&msg_id=214&forum_id=1
(sorry for the external link, I just don't know how to create such
characters here.)

According to the W3C Validator, those characters are still invalid.
http://validator.w3.org/check?uri=http%3A%2F%2Fdebain.org%2Fsoftware%2Ftefinch%2Fdemo%2F%3Fread%3D1%26msg_id%3D214%26forum_id%3D1&charset=%28detect+automatically%29&doctype=%28detect+automatically%29

I know there must be an answer somewhere on the web already, but I have
not found any reference in Google nor in the archives of this
newsgroup.

Any help appreciated.

-Samuel

Navigation:

Next in forum: Re: Want headers sent NOW. How to make PHP send them?
Prev in forum: Re: Key-passing from PHP to TCL CGI script - how is it done (web security issue)?
Thread view: String Validation With UTF-8 Support

[Reply to this message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация