|  | Posted by Andrew @ Rockface on 01/24/06 09:41 
chris_fieldhouse@hotmail.com wrote:> Hi,
 >
 > I'm almost done with a php driven email filter and automated forwarder,
 > I've tested it out with various emails and ironed out plain text and
 > html.
 > But this final item has me stumped.
 >
 > When processing an email which contains UTF8 encoded characters, I
 > can't work out how to detect the presence of the UTF8 characters, so I
 > get =E2=80=99 displayed instead of a '.
 > And when forwarded, the =E2=80=99 is sent as plain text losing the
 > information that it is a utf8 encoded character as opposed to plain
 > text.
 
 Here's a function I use for utf8 detection (I think I grabbed it from
 the manual somewhere):
 
 /*************************************************************/
 /* Returns TRUE if a string is UTF-8.                        */
 /* Returns FALSE if a string is not UTF-8.                   */
 /* Compatible with 31-bit encoding scheme of Unicode 3.x     */
 /*************************************************************/
 
 function seems_utf8 ($Str) {
 for ($i=0; $i<strlen($Str); $i++) {
 if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
 elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
 elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
 elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
 elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
 elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
 else return false; # Does not match any model
 for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
 if ((++$i == strlen($Str)) || ((ord($Str[$i]) & 0xC0) != 0x80))
 return false;
 }
 }
 return true;
 }
 
 Then to decode:
 
 /*************************************************************/
 /* utf8_encode will encode a string that is already encoded! */
 /* This means that everytime you utf8_encode a string it     */
 /* will grow and grow exponentially!                         */
 /* Use this function instead of utf8_encode to check if a    */
 /* string is already encoded before encoding.                */
 /* Needs the seems_utf8 function.                            */
 /*************************************************************/
 
 function utf8_ensure ($str) {
 return seems_utf8($str)? $str: utf8_encode($str);
 }
 
 --
 Andrew @ Rockface
 np: (Winamp is not active ;-)
 www.rockface-records.co.uk
  Navigation: [Reply to this message] |