|  | Posted by Oli Filth on 11/18/05 04:40 
M. Trausch said the following on 18/11/2005 01:49:> Oli Filth wrote:
 >
 >>They aren't represented the same interally at all. A literal hash in a
 >>URL delimits an HTML reference to a named anchor, whereas %23 does not,
 >>it's treated as part of the query string in the HTTP GET request; try
 >>this simple test to demonstrate this:
 >>
 >
 > That's very much like saying the character # on the right side of a hex
 > dump and the '23' on the left side of a hex dump aren't represented
 > internally at all.  It's just a character reference, either way.  Just
 > because one may receive a flag that the other doesn't in one instance or
 > several instances does not mean that it will in *all* instances.
 
 %23 is a character reference, yes, but not an *HTML* character
 reference/entity, it's merely a way of representing # in an HTTP GET
 string, and means nothing in the context of HTML.
 
 The browser treats %23 as exactly that, the literal characters %, 2, 3.
 In the context of a clicked hyperlink, these exact characters are
 transmitted in the corresponding HTTP GET request string. e.g. the
 following link:
 
 <A href="http://example.com/file.php?%23xyz">...</A>
 
 will result in the following HTTP request:
 
 GET /file.php?%23xyz HTTP/1.1
 Host: example.com
 
 At no point between the server delivering the original HTML to the
 browser and the server receiving the GET request has %23 been decoded.
 
 On the other hand, the browser treats the literal # as a delimiter (as
 defined by HTML specs), and strips that (and everything after it) from
 the URL before the HTTP request is made. e.g. the following link:
 
 <A href="http://example.com/file.php?#xyz">...</A>
 
 will result in the following HTTP request:
 
 GET /file.php? HTTP/1.1
 Host: example.com
 
 Entirely different behaviour, working at a different layer (HTML vs.
 HTTP), completely defined by the specs (W3C HTML specs, and RFC 1738).
 
 If you had tried the demo code I posted earlier, you would see this in
 action.
 
 
 >>Where is it defined as "unsafe", except in RFC 1738 where it states that
 >>it's unsafe to use # unless to delimit a named anchor reference?
 >>
 >>Show me an example where it doesn't work...
 >>
 >
 > The fact is that the published standard which addresses the issue states
 > that it's unsafe.
 
 No, it states that it's unsafe to use # in cases other than where you
 mean it to be a delimiter for an HTML anchor identifier.
 
 In cases where you do not intend it as a delimiter, you should encode it
 with the alternative, %23, because this *is* safe (defined as such in
 RFC 1738), and when received by the agent processing the HTTP GET
 request (i.e. the server), it is translated into the originally intended
 character, i.e. #.
 
 
 > It is wise to be cautious and write defensively
 > towards something you can refer, then away from it, even if it does work
 > on 98% of the browsers.  My point was that you cannot make a blanket
 > assumption about something when it's already known that it's unsafe and
 > the behavior of an action is undefined.
 
 However, the behaviour *is* *completely* defined, so any agent (browser,
 server, or otherwise) that behaves differently is in explicit breach of
 the specs, i.e. a bug.
 
 
 
 --
 Oli
  Navigation: [Reply to this message] |