|
Posted by Nick Kew on 01/06/51 11:40
Steve Pugh wrote:
> Those things are useful if set up and interpreted with care, but are
> not 100% definitive.
Treat them as you would viewing figures for a TV show.
>>Typically, companies (e.g. Webtrends) that sell analysis software say
>>the first.
>
>
> Read the small print. WebTrends use cookies and JavaScript instead
> of/as well as server logs.
Spammers. 'nuff said (or should be - Alan expanded on some
more technical reasons).
>> However, there are a number of articles pointing to the
>>second. Notably, the author of "analog", one of the original web log
>>analysis tools, says that you can't *really* get too much meaningful
>>analysis out of your server logs.
>
>
> Yes, Analog reads server logs alone. It doesn't try to do anything with
> JavaScript, cookies, etc.
No spam, no snake oil. No surprise.
I rather suspect the author of analog may even understand the subject.
Unlike those outfits where anyone who understands the issues is firmly
ignored and probably laughed at as a nerdy loser behind their backs.
>>What techniques exist to improve Web Sever Log analysis?
>>
>>How good are they?
>>
>>What can I do to implement them?
Hire a statistician. And make it someone who understands the
infrastructure of the Web. There are very few people who
qualify on both counts.
Now you need to add *knowledge* of the web's infrastructure.
That's different from the *principles*, and much harder to
collect. In fact it's impossible to collect at the level
that would be required for the likes of webtrends to work -
you have to apply the kind of techniques that broadcasters
use. I haven't worked for a broadcaster myself, but I
strongly suspect *they* rely on some pretty ropey assumptions,
too[1].
[1] I have worked as a statistician, and I've seen how things
happen when there is *no data* to validate some part of the
underlying model used. It goes like this:
- Someone picks a figure effectively at random on a 'seems
reasonable' basis just to have something to work with.
That enables them to derive numbers from the model.
- They also try the model with different figures, to test
the effect of varying the unknown. This leads to a perfectly
valid set of "if [value1] then [result1]" results.
- BUT that's too complex for a soundbite culture, so only the
first figure gets reported as a headline conclusion.
- Now, a future practitioner has NO DATA to validate this part
of the model, but has the first paper as a reference to cite.
The assumption is peripheral to the study, so the 'headline'
figure is simply used without question.
- Over time it is much-cited because nobody wants to get involved
in something that cannot be verified. The first researcher's
still totally untested working hypothesis becomes common knowledge,
and 'obviously correct' because everyone uses it.
--
Nick Kew
[Back to original message]
|