You are here: Re: question about php script handling/displaying UTF8 email « PHP Programming Language « IT news, forums, messages
Re: question about php script handling/displaying UTF8 email

Posted by Andrew @ Rockface on 01/24/06 09:41

chris_fieldhouse@hotmail.com wrote:
> Hi,
>
> I'm almost done with a php driven email filter and automated forwarder,
> I've tested it out with various emails and ironed out plain text and
> html.
> But this final item has me stumped.
>
> When processing an email which contains UTF8 encoded characters, I
> can't work out how to detect the presence of the UTF8 characters, so I
> get =E2=80=99 displayed instead of a '.
> And when forwarded, the =E2=80=99 is sent as plain text losing the
> information that it is a utf8 encoded character as opposed to plain
> text.

Here's a function I use for utf8 detection (I think I grabbed it from
the manual somewhere):

/*************************************************************/
/* Returns TRUE if a string is UTF-8. */
/* Returns FALSE if a string is not UTF-8. */
/* Compatible with 31-bit encoding scheme of Unicode 3.x */
/*************************************************************/

function seems_utf8 ($Str) {
for ($i=0; $i<strlen($Str); $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == strlen($Str)) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}

Then to decode:

/*************************************************************/
/* utf8_encode will encode a string that is already encoded! */
/* This means that everytime you utf8_encode a string it */
/* will grow and grow exponentially! */
/* Use this function instead of utf8_encode to check if a */
/* string is already encoded before encoding. */
/* Needs the seems_utf8 function. */
/*************************************************************/

function utf8_ensure ($str) {
return seems_utf8($str)? $str: utf8_encode($str);
}

--
Andrew @ Rockface
np: (Winamp is not active ;-)
www.rockface-records.co.uk

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация