Posted by James on 01/10/06 23:57
I have a function that (by fluke or whatever) used to work perfectly
and seems to have changed behaviour on me. The function was meant to
take a string and convert it from have characters with diacritics to
there non-diacritic equivalent. For example Dürer would become Durer
-- except all of a sudden its becoming DA?rer. This is a problem :)
The function and some sample HTML are below -- any clues or hints would
be appreciated. I do see my extended character represented by the two
-- I understand what has kinda happened I just dont know how to deal
with it ...
<?php
function kill_diacritic ($word_string) {
global $dbtype;
if (empty($word_string)) {
return $word_string;
}
else {
$string_length = strlen($word_string);
for ($x=0;$x<$string_length;$x++) {
$ascii = ord(substr($word_string,$x,1));
switch($ascii){
case 224: // à
case 225: // á
case 226: // â
case 227: // ã
case 228: // ä
case 229: // å
$tmp = "a";
break;
case 231: // ç
$tmp = "c";
break;
case 232: // è
case 233: // é
case 234: // ê
case 235: // ë
$tmp = "e";
break;
case 236: // ì
case 237: // í
case 238: // î
case 239: // ï
$tmp = "i";
break;
case 241: // ñ
$tmp = "n";
break;
case 240: // ð
case 242: // ò
case 243: // ó
case 244: // ô
case 245: // õ
case 246: // ö
case 248: // ø
$tmp = "o";
break;
case 154: //
$tmp = "s";
break;
case 249: // ù
case 251: // û
case 252: // ü
$tmp = "u";
break;
case 253: // ý
$tmp = "y";
break;
case 158: //
$tmp = "z";
break;
case 192: // À
case 193: // Á
case 194: // Â
case 195: // Ã
case 196: // Ä
case 197: // Å
$tmp = "A";
break;
/*
// Oracle represents Æ as a ?. Not sure what MySQL will
// Do with this character. Pretty sure nobody will ever
// search using it but its there regardless.
case 198: // Æ
$tmp = "?";
break;
*/
case 200: // È
case 201: // É
case 202: // Ê
case 203: // Ë
$tmp = "E";
break;
case 208: // Ð
$tmp = "D";
break;
case 204: // Ì
case 205: // Í
case 206: // Î
case 207: // Ï
$tmp = "I";
break;
case 209: // Ñ
$tmp = "N";
break;
case 210: // Ò
case 211: // Ó
case 212: // Ô
case 213: // Õ
case 214: // Ö
case 216: // Ø
$tmp = "O";
break;
case 138: //
$tmp = "S";
break;
case 217: // Ù
case 218: // Ú
case 219: // Û
case 220: // Ü
$tmp = "U";
break;
case 159: //
case 221: // Ý
$tmp = "Y";
break;
} // switch
if (!empty($tmp) or $tmp=="_") {
$word_string = str_replace(chr($ascii),$tmp,$word_string);
$tmp="";
}
} // for
}
return $word_string;
}
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>UTF8 Testing</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel='stylesheet' href='styles/default/stylesheet.css'
type='text/css'>
<script type="text/JavaScript" src="javascript.js"></script>
</head>
<body>
<form method="GET" action="index.php">
<input type="text" name="s" size="10" value="<?php echo
$_GET['s']; ?>">
<input type="submit" value="Search">
</form>
<p>
<?php echo $_GET['s']; ?>
</p>
<p>
<?php echo kill_diacritic($_GET['s']); ?>
</p>
</body>
</html>
[Back to original message]
|