|
Posted by Darko on 11/02/07 20:42
On Nov 2, 9:02 pm, Jake <off...@gmail.com> wrote:
> I have a string that has UTF-8 characters encoded using html
> entities. For example the string "é 字" is being encoded as "é
> 字". I have no control over how this string is given to me, so
> I need to figure out a way to decode "é 字" back into "é
> 字".
>
> I have already tried urldecode, html_entity_decode, utf8_decode and
> convert_uudecode without success. My server environment is limited to
> the latest version of PHP 4, so I cant use any PHP 5 stuff.
>
> Anyone have suggestions?
Here's the sample from php.net's page about utf8_encode (http://
www.php.net/manual/en/function.utf8-encode.php), thanks to certain
luka8088:
function html_to_utf8 ($data)
{
return preg_replace("/\\&\\#([0-9]{3,10})\\;/e", '_html_to_utf8("\
\1")', $data);
}
function _html_to_utf8 ($data)
{
if ($data > 127)
{
$i = 5;
while (($i--) > 0)
{
if ($data != ($a = $data % ($p = pow(64, $i))))
{
$ret = chr(base_convert(str_pad(str_repeat(1, $i + 1),
8, "0"), 2, 10) + (($data - $a) / $p));
for ($i; $i > 0; $i--)
$ret .= chr(128 + ((($data % pow(64, $i)) - ($data
% ($p = pow(64, $i - 1)))) / $p));
break;
}
}
} else
$ret = "&#$data;";
return $ret;
}
Example:
echo html_to_utf8("a b č ć ž こ に ち
わ ()[]{}!#$?* < >");
Output:
a b č ć ž こ に ち わ ()[]{}!#$?* < >
Cheers
[Back to original message]
|