Posted by lkrubner on 11/18/05 00:15
this is a function that someone has up on www.php.net:
function seemsUTF8($Str) {
// bmorel at ssi dot fr
//17-Feb-2004 01:22
//Here is an improved version of that function, compatible with 31-bit
encoding scheme of //Unicode //3.x :
for ($i=0; $i < strlen($Str); $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j < $n; $j++) {
# n bytes matching 10bbbbbb follow ?
if ((++$i == strlen($Str)) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
What is achieved by the variable $n? I don't know enough about
character codes to understand what that final inner for loop is trying
to do.
[Back to original message]
|