Posted by Andrew @ Rockface on 01/24/06 09:41
chris_fieldhouse@hotmail.com wrote:
> Hi,
>
> I'm almost done with a php driven email filter and automated forwarder,
> I've tested it out with various emails and ironed out plain text and
> html.
> But this final item has me stumped.
>
> When processing an email which contains UTF8 encoded characters, I
> can't work out how to detect the presence of the UTF8 characters, so I
> get =E2=80=99 displayed instead of a '.
> And when forwarded, the =E2=80=99 is sent as plain text losing the
> information that it is a utf8 encoded character as opposed to plain
> text.
Here's a function I use for utf8 detection (I think I grabbed it from
the manual somewhere):
/*************************************************************/
/* Returns TRUE if a string is UTF-8. */
/* Returns FALSE if a string is not UTF-8. */
/* Compatible with 31-bit encoding scheme of Unicode 3.x */
/*************************************************************/
function seems_utf8 ($Str) {
for ($i=0; $i<strlen($Str); $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == strlen($Str)) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
Then to decode:
/*************************************************************/
/* utf8_encode will encode a string that is already encoded! */
/* This means that everytime you utf8_encode a string it */
/* will grow and grow exponentially! */
/* Use this function instead of utf8_encode to check if a */
/* string is already encoded before encoding. */
/* Needs the seems_utf8 function. */
/*************************************************************/
function utf8_ensure ($str) {
return seems_utf8($str)? $str: utf8_encode($str);
}
--
Andrew @ Rockface
np: (Winamp is not active ;-)
www.rockface-records.co.uk
[Back to original message]
|