Reply to Re: urlencode and $_GET — PHP Programming Language

Posted by Oli Filth on 11/18/05 04:40

M. Trausch said the following on 18/11/2005 01:49:
> Oli Filth wrote:
>
>>They aren't represented the same interally at all. A literal hash in a
>>URL delimits an HTML reference to a named anchor, whereas %23 does not,
>>it's treated as part of the query string in the HTTP GET request; try
>>this simple test to demonstrate this:
>>
>
> That's very much like saying the character # on the right side of a hex
> dump and the '23' on the left side of a hex dump aren't represented
> internally at all. It's just a character reference, either way. Just
> because one may receive a flag that the other doesn't in one instance or
> several instances does not mean that it will in *all* instances.

%23 is a character reference, yes, but not an *HTML* character
reference/entity, it's merely a way of representing # in an HTTP GET
string, and means nothing in the context of HTML.

The browser treats %23 as exactly that, the literal characters %, 2, 3.
In the context of a clicked hyperlink, these exact characters are
transmitted in the corresponding HTTP GET request string. e.g. the
following link:

<A href="http://example.com/file.php?%23xyz">...</A>

will result in the following HTTP request:

GET /file.php?%23xyz HTTP/1.1
Host: example.com

At no point between the server delivering the original HTML to the
browser and the server receiving the GET request has %23 been decoded.

On the other hand, the browser treats the literal # as a delimiter (as
defined by HTML specs), and strips that (and everything after it) from
the URL before the HTTP request is made. e.g. the following link:

<A href="http://example.com/file.php?#xyz">...</A>

will result in the following HTTP request:

GET /file.php? HTTP/1.1
Host: example.com

Entirely different behaviour, working at a different layer (HTML vs.
HTTP), completely defined by the specs (W3C HTML specs, and RFC 1738).

If you had tried the demo code I posted earlier, you would see this in
action.

>>Where is it defined as "unsafe", except in RFC 1738 where it states that
>>it's unsafe to use # unless to delimit a named anchor reference?
>>
>>Show me an example where it doesn't work...
>>
>
> The fact is that the published standard which addresses the issue states
> that it's unsafe.

No, it states that it's unsafe to use # in cases other than where you
mean it to be a delimiter for an HTML anchor identifier.

In cases where you do not intend it as a delimiter, you should encode it
with the alternative, %23, because this *is* safe (defined as such in
RFC 1738), and when received by the agent processing the HTTP GET
request (i.e. the server), it is translated into the originally intended
character, i.e. #.

> It is wise to be cautious and write defensively
> towards something you can refer, then away from it, even if it does work
> on 98% of the browsers. My point was that you cannot make a blanket
> assumption about something when it's already known that it's unsafe and
> the behavior of an action is undefined.

However, the behaviour *is* *completely* defined, so any agent (browser,
server, or otherwise) that behaves differently is in explicit breach of
the specs, i.e. a bug.

--
Oli

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация