| 
	
 | 
 Posted by  csimam on 09/04/07 03:06 
Here is a thread I posted elsewhere that I wonder if someone here 
could shed some light on... 
 
The original question I had: 
 
--- BEGIN --- 
Which one is better? 
1. To perform aggregate queries on data, and use that data to home in 
and perform more detailed analysis? This gives result sets that are 
potentially filled with columns that contain the aggregate data 
repeated across many rows. 
I meant here by homing in: within the same query - using 
nested SELECTS. For example, the inner query performs the aggregate 
analysis, and returns whatever data we are interested in, and then 
the 
outer query performs another SELECT on the same data, resulting in 
the 
aggregate data being distributed across 'detailed results' from the 
outer query. 
 
2. To perform the aggregate query once, and then hit the database 
again for a detailed analysis, and return the detailed analysis 
results in an order that is easily cross-correlated with the original 
aggregate data? This means running the aggregate query essentially 
twice, just the second time it is hidden as a subquery. 
--- END --- 
 
I realize the question was vague, but a super DBA understood what I 
was asking and suggested I study up on ROLLUP and CUBE (I am not a DBA 
by the way). So I did... and here is what I found: 
 
--- BEGIN --- 
The data I have, I want to generate totals and subtotals, say of 
sales: total sales per store, and subtotal of sales per store per day 
for example. 
 
The problem is that I ultimately only want the top 25 stores in total 
sales BUT with their subtotals as well... So I used ROLLUP to 
generate 
the totals and subtotals... and in the same query I generated a 
ranking of the data... and there I found my problem: the ranking does 
not distinguish between totals and subtotals. So the results look 
like: 
 
store1, realTotal, 1 
store1, subTotal, 2 
store2, realTotal, 3 
store7, realTotal, 4 
store10, realTotal, 5 
store10, subTotal, 6 
 
Clearly, the rows with subtotals should NOT be ranked... but I am 
struggling with SQL to figure out how to get exactly what I want. 
 
Well... I have a solution. But I'm not sure about it. It involves 
something I learned about only recently: a user-defined aggregate 
function that concatenates values from across rows. Using this 
aggregate function, stragg, I can do another pass through the results 
mentioned above while grouping by store, and produce: 
 
store1, 'realTotal subTotal', 1 
store2, 'realTotal subTotal', 3 
store7, 'realTotal subTotal', 4 
store10, 'realTotal subTotal', 5 
 
And another trick: use min( rank ) in the same aggregation, and order 
by it as well. 
 
Together that gives one a very compact result set that has everything 
ranked and grouped perfectly. 
 
....but... a worry: is this REALLY better than doing a nested query? I 
wonder if the use of the user-defined aggregate function is very 
expensive... 
 
For the benefit of anyone wondering, here's the structure that 
works... in one pass in theory... but again, I am now wondering if it 
really is faster than the nested query equivalent that hits the 
database twice on the table 'stores' (seems to me that the middle 
select below is effectively hitting the table again, just in 
temporary 
form): 
 
select rownum as rank, id, days, totals 
from 
( 
   select id, stragg(day) as days, stragg(total) as totals, 
min( rank ) as innerrank 
   from 
   ( 
      select id, day, sum(sales) as total, rank() over (order by 
sum(sales) desc) rank 
      from stores 
      group by id, rollup(day) 
   ) 
   group by id 
   order by min(rank) 
) 
where rownum <= 25 
 
--- END --- 
 
Any comments? Is this query really better than the following: 
 
-- get totals by day 
select rank, id, day, total, sum(sales) as dailytotal 
from 
( 
   select * from 
   ( 
      -- get grand totals regardless of day 
      select id as storeid, sum( sales ) as total, 
      rank() over (order by sum(sales) desc) rank 
      from stores 
      group by id 
   ) 
   where rank <= 25 
), stores 
-- NOTE join with stores again 
where id = storeid 
-- but does the where condition above clip the query and make it 
efficient? 
group by rank, id, day, total 
order by rank, day
 
[Back to original message] 
 |