|  | Posted by  csimam on 09/04/07 03:06 
Here is a thread I posted elsewhere that I wonder if someone herecould shed some light on...
 
 The original question I had:
 
 --- BEGIN ---
 Which one is better?
 1. To perform aggregate queries on data, and use that data to home in
 and perform more detailed analysis? This gives result sets that are
 potentially filled with columns that contain the aggregate data
 repeated across many rows.
 I meant here by homing in: within the same query - using
 nested SELECTS. For example, the inner query performs the aggregate
 analysis, and returns whatever data we are interested in, and then
 the
 outer query performs another SELECT on the same data, resulting in
 the
 aggregate data being distributed across 'detailed results' from the
 outer query.
 
 2. To perform the aggregate query once, and then hit the database
 again for a detailed analysis, and return the detailed analysis
 results in an order that is easily cross-correlated with the original
 aggregate data? This means running the aggregate query essentially
 twice, just the second time it is hidden as a subquery.
 --- END ---
 
 I realize the question was vague, but a super DBA understood what I
 was asking and suggested I study up on ROLLUP and CUBE (I am not a DBA
 by the way). So I did... and here is what I found:
 
 --- BEGIN ---
 The data I have, I want to generate totals and subtotals, say of
 sales: total sales per store, and subtotal of sales per store per day
 for example.
 
 The problem is that I ultimately only want the top 25 stores in total
 sales BUT with their subtotals as well... So I used ROLLUP to
 generate
 the totals and subtotals... and in the same query I generated a
 ranking of the data... and there I found my problem: the ranking does
 not distinguish between totals and subtotals. So the results look
 like:
 
 store1, realTotal, 1
 store1, subTotal, 2
 store2, realTotal, 3
 store7, realTotal, 4
 store10, realTotal, 5
 store10, subTotal, 6
 
 Clearly, the rows with subtotals should NOT be ranked... but I am
 struggling with SQL to figure out how to get exactly what I want.
 
 Well... I have a solution. But I'm not sure about it. It involves
 something I learned about only recently: a user-defined aggregate
 function that concatenates values from across rows. Using this
 aggregate function, stragg, I can do another pass through the results
 mentioned above while grouping by store, and produce:
 
 store1, 'realTotal subTotal', 1
 store2, 'realTotal subTotal', 3
 store7, 'realTotal subTotal', 4
 store10, 'realTotal subTotal', 5
 
 And another trick: use min( rank ) in the same aggregation, and order
 by it as well.
 
 Together that gives one a very compact result set that has everything
 ranked and grouped perfectly.
 
 ....but... a worry: is this REALLY better than doing a nested query? I
 wonder if the use of the user-defined aggregate function is very
 expensive...
 
 For the benefit of anyone wondering, here's the structure that
 works... in one pass in theory... but again, I am now wondering if it
 really is faster than the nested query equivalent that hits the
 database twice on the table 'stores' (seems to me that the middle
 select below is effectively hitting the table again, just in
 temporary
 form):
 
 select rownum as rank, id, days, totals
 from
 (
 select id, stragg(day) as days, stragg(total) as totals,
 min( rank ) as innerrank
 from
 (
 select id, day, sum(sales) as total, rank() over (order by
 sum(sales) desc) rank
 from stores
 group by id, rollup(day)
 )
 group by id
 order by min(rank)
 )
 where rownum <= 25
 
 --- END ---
 
 Any comments? Is this query really better than the following:
 
 -- get totals by day
 select rank, id, day, total, sum(sales) as dailytotal
 from
 (
 select * from
 (
 -- get grand totals regardless of day
 select id as storeid, sum( sales ) as total,
 rank() over (order by sum(sales) desc) rank
 from stores
 group by id
 )
 where rank <= 25
 ), stores
 -- NOTE join with stores again
 where id = storeid
 -- but does the where condition above clip the query and make it
 efficient?
 group by rank, id, day, total
 order by rank, day
  Navigation: [Reply to this message] |