|
Posted by windandwaves on 11/09/12 11:31
Ian B wrote:
> The problem is actually not as difficult as you might think - and you
> can get accurate results
>
> :-)
>
>
> First thing you need to realise is that you need your table in two
> ways:
>
> a) you need to look up id's which have accessed same page as the
> current page
> b) you need to look up the pages which those id's have accessed
>
> also
>
> c) you need to exclude the current page in b)
> d) you need to count the number of times each related page occurs
> e) you need to rank the results in order of popularity
>
>
> This might sound like a programming problem, but it can be done in one
> SQL statement
>
> I'll build it up so you can see how I got there:
>
> SELECT * FROM mytable AS t1
>
> gets the whole table and lets you refer to it as "t1"
>
>
> SELECT * FROM mytable AS t1
> LEFT JOIN mytable AS t2 ON t1.session_id = t2.session_id
>
> this gets you shedloads of results - one line for every combination
> so if id#1 has visited pages a, b & c then you get
>
> a-a
> a-b
> a-c
> b-a
> b-b
> b-c
> c-a
> c-b
> c-c
>
> but we only want the ones for, say, page "a", and we only need
> (from this data) the 'other' page
>
> SELECT t2.page_name FROM mytable AS t1
> LEFT JOIN mytable AS t2 ON t1.session_id = t2.session_id
> WHERE t1.page_name = 'a'
>
> this (for id#1)
>
> would produce
>
> b
> c
>
> running against all the data, would produce a whole list of pages -
> all the pages that anyone who has visited page 'a' has also been to
>
> If we group by page name, we'll get one line per page name
>
>
> SELECT t2.page_name FROM mytable AS t1
> LEFT JOIN mytable AS t2 ON t1.session_id = t2.session_id
> WHERE t1.page_name = 'a'
> AND t2.page_name <> 'a'
> GROUP BY t2.page_name
>
> g
> d
> e
> f
> b
> c
>
> If we add a count, we get the number of times that page turned up
>
>
> SELECT t2.page_name,Count(t2.page_name) AS popularity FROM mytable AS
> t1
> LEFT JOIN mytable AS t2 ON t1.session_id = t2.session_id
> WHERE t1.page_name = 'a'
> AND t2.page_name <> 'a'
> GROUP BY t2.page_name
>
>
> g 10
> d 27
> e 19
> f 41
> b 110
> c 83
>
>
> And if we order by descending popularity, we're almost there
>
>
>
> SELECT t2.page_name,Count(t2.page_name) AS popularity FROM mytable AS
> t1
> LEFT JOIN mytable AS t2 ON t1.session_id = t2.session_id
> WHERE t1.page_name = 'a'
> AND t2.page_name <> 'a'
> GROUP BY t2.page_name
> ORDER BY Count(t2.page_name) DESC
>
> b 110
> c 83
> f 41
> d 27
> e 19
> g 10
>
>
>
> The last thing to do is limit the results to the top 5 results
>
>
> SELECT t2.page_name,Count(t2.page_name) AS popularity FROM mytable AS
> t1
> LEFT JOIN mytable AS t2 ON t1.session_id = t2.session_id
> WHERE t1.page_name = 'a'
> AND t2.page_name <> 'a'
> GROUP BY t2.page_name
> ORDER BY Count(t2.page_name) DESC
> LIMIT 5
>
>
> b 110
> c 83
> f 41
> d 27
> e 19
>
> And there's your list
>
>
> ...
>
> Ian
I never read a more indepth and clearer reply!!!!!
Thank you SO MUCH (a lot more than a million).
Nicolaas
[Back to original message]
|