|  | Posted by Alan J. Flavell on 03/19/06 22:05 
On Sun, 19 Mar 2006, Jim Higson wrote:
 > Just a note, but file extensions in the URL aren't really a very
 > good way to decide what type the content is. When Tim Berners-Lee
 > designed the WW, he decided to use an HTTP header ("Content-Type")
 > instead.
 
 Right; but in the interests of accuracy, RFC2616 is an IETF
 standards-track RFC - which means it behoves all Internet users to
 observe its requirements, no matter what their opinion of TimBL and
 the W3C might happen to be.
 
 [good points snipped...]
 
 > 5) Only really MS Windows uses file extentions to decide what type a
 > file is.
 
 MS Windows does, it's true, but MSIE generally doesn't, as their
 documentation shows:
 http://msdn.microsoft.com/workshop/networking/moniker/overview/appendix_a.asp
 
 In trying to guess what the content really might be, the filename
 extension is pretty much its last resort.
 
 > Unix typically looks at the contents of the file itself to decide.
 
 Well, yes; but a very common way to use Apache is with the MIME
 content-type determined by the filename extension *at the server*
 (which might or might not appear in the associated URL, as you say).
 
 There isn't any HTTP content type which clearly means "the receiving
 OS should guess", so the issue of how a unix-type operating system
 might guess is mostly off-topic.  If *and only if* (in the words of
 RFC2616) the sender has omitted to provide a Content-type header, the
 client agent is permitted to guess - but this is already a dubious
 situation, since RFC2616 told the sender that they SHOULD provide an
 appropriate Content-type header.
 
 I hadn't seen an HTTP 200 response without a Content-type header for
 many years.  I *did* see one quite recently - and, guess what, the
 server that sent it said that it was IIS.  Yet another case of
 software from Galactic HQ spitting in the face of the Internet
 specifications.
 
 The only remaining situation where it's doubtful what RFC2616 says
 should happen, is application/octet-stream.  Some say that this can
 only be saved to file, since it can't be unambiguously associated with
 any viewer or application at the client side.  Others say that the
 wording of RFC2616 doesn't actually disallow the client agent trying
 to guess what it is.  (I'm fairly agnostic on this point, but this
 isn't about me.)
 
 > Not necessarilty. This can be done in htaccess. The server is at
 > fault though, it is telling the browser that the content is
 > "text/html" and the browser is believing it.
 
 RFC2616 leaves the browser only two choices: either treat it as what
 it claims to be, or reject it.  In practical terms, "reject it" could
 mean appealing to the user and getting their informed consent to
 proceed on the basis of what the stuff appears to be, rather than what
 the sender claims it to be.  The majority of WWW users would,
 admittedly, be in no position to take a proper decision on that
 "informed" consent; but if all client agents (including the operating
 system component that thinks it's a web browser) were behaving in
 accordance with RFC2616, then this situation simply wouldn't arise,
 since everyone providing content would see the problem as soon as they
 tried to access it themselves, and would correct it forthwith.
 [Back to original message] |