|
Posted by Alan J. Flavell on 03/19/06 22:05
On Sun, 19 Mar 2006, Jim Higson wrote:
> Just a note, but file extensions in the URL aren't really a very
> good way to decide what type the content is. When Tim Berners-Lee
> designed the WW, he decided to use an HTTP header ("Content-Type")
> instead.
Right; but in the interests of accuracy, RFC2616 is an IETF
standards-track RFC - which means it behoves all Internet users to
observe its requirements, no matter what their opinion of TimBL and
the W3C might happen to be.
[good points snipped...]
> 5) Only really MS Windows uses file extentions to decide what type a
> file is.
MS Windows does, it's true, but MSIE generally doesn't, as their
documentation shows:
http://msdn.microsoft.com/workshop/networking/moniker/overview/appendix_a.asp
In trying to guess what the content really might be, the filename
extension is pretty much its last resort.
> Unix typically looks at the contents of the file itself to decide.
Well, yes; but a very common way to use Apache is with the MIME
content-type determined by the filename extension *at the server*
(which might or might not appear in the associated URL, as you say).
There isn't any HTTP content type which clearly means "the receiving
OS should guess", so the issue of how a unix-type operating system
might guess is mostly off-topic. If *and only if* (in the words of
RFC2616) the sender has omitted to provide a Content-type header, the
client agent is permitted to guess - but this is already a dubious
situation, since RFC2616 told the sender that they SHOULD provide an
appropriate Content-type header.
I hadn't seen an HTTP 200 response without a Content-type header for
many years. I *did* see one quite recently - and, guess what, the
server that sent it said that it was IIS. Yet another case of
software from Galactic HQ spitting in the face of the Internet
specifications.
The only remaining situation where it's doubtful what RFC2616 says
should happen, is application/octet-stream. Some say that this can
only be saved to file, since it can't be unambiguously associated with
any viewer or application at the client side. Others say that the
wording of RFC2616 doesn't actually disallow the client agent trying
to guess what it is. (I'm fairly agnostic on this point, but this
isn't about me.)
> Not necessarilty. This can be done in htaccess. The server is at
> fault though, it is telling the browser that the content is
> "text/html" and the browser is believing it.
RFC2616 leaves the browser only two choices: either treat it as what
it claims to be, or reject it. In practical terms, "reject it" could
mean appealing to the user and getting their informed consent to
proceed on the basis of what the stuff appears to be, rather than what
the sender claims it to be. The majority of WWW users would,
admittedly, be in no position to take a proper decision on that
"informed" consent; but if all client agents (including the operating
system component that thinks it's a web browser) were behaving in
accordance with RFC2616, then this situation simply wouldn't arise,
since everyone providing content would see the problem as soon as they
tried to access it themselves, and would correct it forthwith.
[Back to original message]
|