|
Posted by Alan J. Flavell on 11/22/93 11:40
On Mon, 20 Feb 2006, Steve Pugh wrote:
> Read the small print. WebTrends use cookies and JavaScript instead
> of/as well as server logs.
Both of which, discerning users have been selectively blocking for
many years. What was that program we had back in Win95 days, which
blocked such things from any browser? I've actually forgotten its
name, and my old '95 PC has long since gone to the knacker's yard, but
it was definitely there; and nowadays such functions come built-in to
any decent browser.
However, servers which insist on using such techniques are inhibiting
cacheability, and thus ensuring a less responsive web, and thus are
interfering in a negative way with the results which their users
experience (*all* of their users - not only those discerning users who
block these attempts to peek into their activities).
This is, in effect, the Heisenberg law of web statistics - the harder
you try to get accurate answers, the more you interfere with the way
that the web works (recalling that HTTP was quite deliberately
designed to be "stateless"), and the worse you are able to serve the
requests of your users. And so, you end up getting more-accurate
measurements of something that would be working much better if only
you'd stop trying so hard to measure it.
> They have a number of products and services which offer differing
> levels of accuracy. But at the end of the day they can not be 100%
> accurate.
Worse than that: they aren't just "inaccurate", they are seriously
"biased", but you have no way of estimating the bias.
For example, if you improved your cacheability, your users would get
faster responses, and you might get more users sticking around to read
your site, whereas your server statistics would show fewer hits thanks
to all those folks who were getting the pages out of an intermediate
cache. And would show gaps in your statistics because they revisit
pages in their *own* browser cache, whereas previously they were
having to wait to re-fetch the same page from your server on every
revisit.
> Think of them as providing information on general trends
Yeah, such as when a certain large ISP deployed a new bank of cache
servers, and the "trends" apparently showed that users had
mysteriously lost interest in the web site in question. Strangely,
each popular page that was hit on the server was being hit exactly
once every 24 hours, after which nothing was heard again from that ISP
for another 24 hours. Yup, that ISP was callously ignoring everything
that the server told it in terms of this page is uncacheable, expires
in January 1970, etc. etc., and was cacheing each page for 24 hours
without appeal. No, I'm sorry: those "trends" don't really show very
much, unless and until you really know what's happening OUT THERE.
But your server statistics have no way to tell you what's happening
out there. They're selective, and biased, and, often enough, if
interpreted to show what people demand to know - rather than
interpreted in terms of the information they really contain - can
appear to show the opposite of the truth.
Let us consider for example those misguided folks who notice that >70%
of their users appear (according to the logged user agent) to be using
MSIE, so they "optimise" their site specifically for MSIE, and,
surprise surprise, the proportion of MSIE users rises. So would you
say they acted correctly, when most everyone else reports that the
proportion appearing to use MSIE is falling? For one thing, Opera
users are starting to stand up for themselves - many of them are no
longer willing to hide behind a user agent string which pretends to be
MSIE.
Many other changes are happening "out there", which make those numbers
viewed down the wrong end of the telescope at your server log into
highly misleading indicators of anything - except your server load,
and possibly a handy way to identify broken links.
> > However, there are a number of articles pointing to the second.
> > Notably, the author of "analog", one of the original web log
> > analysis tools, says that you can't *really* get too much
> > meaningful analysis out of your server logs.
The author of Analog works in statistics, AIUI, and is determined to
tell the truth about web servers, no matter how much some web server
operators insist that they prefer to be fooled by convincing-looking
numbers about the behaviour of their visitors. Good for him.
Navigation:
[Reply to this message]
|