|
Posted by Oli Filth on 11/18/05 04:40
M. Trausch said the following on 18/11/2005 01:49:
> Oli Filth wrote:
>
>>They aren't represented the same interally at all. A literal hash in a
>>URL delimits an HTML reference to a named anchor, whereas %23 does not,
>>it's treated as part of the query string in the HTTP GET request; try
>>this simple test to demonstrate this:
>>
>
> That's very much like saying the character # on the right side of a hex
> dump and the '23' on the left side of a hex dump aren't represented
> internally at all. It's just a character reference, either way. Just
> because one may receive a flag that the other doesn't in one instance or
> several instances does not mean that it will in *all* instances.
%23 is a character reference, yes, but not an *HTML* character
reference/entity, it's merely a way of representing # in an HTTP GET
string, and means nothing in the context of HTML.
The browser treats %23 as exactly that, the literal characters %, 2, 3.
In the context of a clicked hyperlink, these exact characters are
transmitted in the corresponding HTTP GET request string. e.g. the
following link:
<A href="http://example.com/file.php?%23xyz">...</A>
will result in the following HTTP request:
GET /file.php?%23xyz HTTP/1.1
Host: example.com
At no point between the server delivering the original HTML to the
browser and the server receiving the GET request has %23 been decoded.
On the other hand, the browser treats the literal # as a delimiter (as
defined by HTML specs), and strips that (and everything after it) from
the URL before the HTTP request is made. e.g. the following link:
<A href="http://example.com/file.php?#xyz">...</A>
will result in the following HTTP request:
GET /file.php? HTTP/1.1
Host: example.com
Entirely different behaviour, working at a different layer (HTML vs.
HTTP), completely defined by the specs (W3C HTML specs, and RFC 1738).
If you had tried the demo code I posted earlier, you would see this in
action.
>>Where is it defined as "unsafe", except in RFC 1738 where it states that
>>it's unsafe to use # unless to delimit a named anchor reference?
>>
>>Show me an example where it doesn't work...
>>
>
> The fact is that the published standard which addresses the issue states
> that it's unsafe.
No, it states that it's unsafe to use # in cases other than where you
mean it to be a delimiter for an HTML anchor identifier.
In cases where you do not intend it as a delimiter, you should encode it
with the alternative, %23, because this *is* safe (defined as such in
RFC 1738), and when received by the agent processing the HTTP GET
request (i.e. the server), it is translated into the originally intended
character, i.e. #.
> It is wise to be cautious and write defensively
> towards something you can refer, then away from it, even if it does work
> on 98% of the browsers. My point was that you cannot make a blanket
> assumption about something when it's already known that it's unsafe and
> the behavior of an action is undefined.
However, the behaviour *is* *completely* defined, so any agent (browser,
server, or otherwise) that behaves differently is in explicit breach of
the specs, i.e. a bug.
--
Oli
[Back to original message]
|