|
Posted by Andy Hassall on 09/10/05 02:47
On 9 Sep 2005 14:59:21 -0700, "Louise GK" <louisegk@gmail.com> wrote:
>Hi there. I've written a simple program that makes a simple GET form
>with a text input box and displays $_GET["foo"] when submitted.
>
>Using Windows Character Map, I pasted in the Cyrillic capital "Ya" (the
>backward R) and it came out as "Я". So far so good.
>
>Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)
>
>That came out as "Я Я". How can I please tell the
>difference between the Cyrillic and the character sequence '&', '#',
>etc...?
>
>It seems to me that the '&' character should be transformed into
>"&" just like the Cyrillic characters. Perhaps I have misunderstood
>something along the way.
What encoding is the page with the form in?
Some browsers will, if the page is in an encoding that does not contain the
character being pasted in, convert the character to an HTML character entity -
this is then indistinguishable from pasting the character entitity itself in.
Try the code below (filename: form_encoding.php), pasting a Ya followed by the
literal text "Я" into the input box.
Note what happens when you switch page encodings and resubmit the text;
iso-8859-15 doesn't contain a Ya, so the browser tries to make the best of an
impossible situation and sends the HTML character entity representation
instead.
The other two encodings, utf-8 and iso-8859-5 do contain Ya, so you get the
correct behaviour, i.e a Ya, and the text of the HTML entity.
<?php
$encoding = isset($_GET['encoding']) ? $_GET['encoding'] : 'iso-8859-15';
header("Content-type: text/html; charset=$encoding");
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>form encoding</title>
</head>
<body>
<form method="get" action="form_encoding.php">
<input type="radio" name="encoding" value="iso-8859-15"
id="encoding-iso-8859-15">
<label for="encoding-iso-8859-15">iso-8859-15 (Western European)</label><br>
<input type="radio" name="encoding" value="utf-8" id="encoding-utf-8">
<label for="encoding-utf-8">utf-8 (Unicode)</label><br>
<input type="radio" name="encoding" value="iso-8859-5"
id="encoding-iso-8859-5">
<label for="encoding-iso-8859-5">iso-8859-5 (Cyrillic)</label><br>
<input type="submit" value="Set Encoding">
</form>
<p>Encoding: <?php print $encoding; ?></p>
<form method="get" action="form_encoding.php">
<input type="hidden" name="encoding" value="<?php print
htmlspecialchars($encoding);?>"><br>
<input type="text" name="input">
<input type="submit">
</form>
<?php
if (isset($_GET['input']))
{
print htmlspecialchars($_GET['input'], ENT_QUOTES, $encoding);
}
?>
</body>
</html>
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Navigation:
[Reply to this message]
|