Reply to Re: Telling Unicode and real & characters apart.

Your name:

Reply:


Posted by Andy Hassall on 09/10/05 02:47

On 9 Sep 2005 14:59:21 -0700, "Louise GK" <louisegk@gmail.com> wrote:

>Hi there. I've written a simple program that makes a simple GET form
>with a text input box and displays $_GET["foo"] when submitted.
>
>Using Windows Character Map, I pasted in the Cyrillic capital "Ya" (the
>backward R) and it came out as "&#1071;". So far so good.
>
>Then I sent in "[R] &#1071;" (The [R] is the Cyrillic character again.)
>
>That came out as "&#1071; &#1071;". How can I please tell the
>difference between the Cyrillic and the character sequence '&', '#',
>etc...?
>
>It seems to me that the '&' character should be transformed into
>"&amp;" just like the Cyrillic characters. Perhaps I have misunderstood
>something along the way.

What encoding is the page with the form in?

Some browsers will, if the page is in an encoding that does not contain the
character being pasted in, convert the character to an HTML character entity -
this is then indistinguishable from pasting the character entitity itself in.

Try the code below (filename: form_encoding.php), pasting a Ya followed by the
literal text "&#1071;" into the input box.

Note what happens when you switch page encodings and resubmit the text;
iso-8859-15 doesn't contain a Ya, so the browser tries to make the best of an
impossible situation and sends the HTML character entity representation
instead.

The other two encodings, utf-8 and iso-8859-5 do contain Ya, so you get the
correct behaviour, i.e a Ya, and the text of the HTML entity.

<?php
$encoding = isset($_GET['encoding']) ? $_GET['encoding'] : 'iso-8859-15';
header("Content-type: text/html; charset=$encoding");
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>form encoding</title>
</head>
<body>
<form method="get" action="form_encoding.php">
<input type="radio" name="encoding" value="iso-8859-15"
id="encoding-iso-8859-15">
<label for="encoding-iso-8859-15">iso-8859-15 (Western European)</label><br>

<input type="radio" name="encoding" value="utf-8" id="encoding-utf-8">
<label for="encoding-utf-8">utf-8 (Unicode)</label><br>

<input type="radio" name="encoding" value="iso-8859-5"
id="encoding-iso-8859-5">
<label for="encoding-iso-8859-5">iso-8859-5 (Cyrillic)</label><br>

<input type="submit" value="Set Encoding">
</form>

<p>Encoding: <?php print $encoding; ?></p>

<form method="get" action="form_encoding.php">
<input type="hidden" name="encoding" value="<?php print
htmlspecialchars($encoding);?>"><br>
<input type="text" name="input">
<input type="submit">
</form>
<?php
if (isset($_GET['input']))
{
print htmlspecialchars($_GET['input'], ENT_QUOTES, $encoding);
}
?>
</body>
</html>

--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool

[Back to original message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация