|
Query fun and games
Date: 02/23/12
(SQL Server) Keywords: xml, sql
I've found in general for SQL that there is more than one way to solve (almost) any problem. I've been playing around with query building today and decided to see how many ways I could solve a problem that recurs fairly frequently in my work, flattening subrecords into a single row.
This is my current standard solution, using the PIVOT function. It's quite fast, but limits you to a specific number of subrecords--it can be a high number, but you still have to decide on a maximum.
WITH cte AS (SELECT Person.contactid AS 'ID' , Person.FullName AS 'Name' , 'Activity' = Activity.a422_rel_activityvalueidname , 'Row' = ROW_NUMBER() OVER (PARTITION BY Person.contactid, Person.FullName ORDER BY Activity.a422_rel_activityvalueidname) FROM Contact AS Person INNER JOIN Task AS Activity ON Person.contactid = Activity.regardingobjectid) SELECT ID, Name , 'Activity1' = [1], 'Activity2' = [2], 'Activity3' = [3], 'Activity4' = [4], 'Activity5' = [5] FROM cte PIVOT (MAX(cte.Activity) FOR cte.[Row] IN ([1], [2], [3], [4], [5])) AS pvt
This is a new solution I found in surfing some SQL Server blogs, using FOR XML PATH to create a CSV list of values. It will include an indefinite number of subrecords, but only includes one field from the subrecords. It's significantly slower than the first example by at least an order of ten.
SELECT DISTINCT p.contactid AS 'ID' , p.FullName AS 'Name' , SUBSTRING((SELECT ', ' + Activity.a422_rel_activityvalueidname FROM task AS Activity WHERE Activity.regardingobjectid = p.contactid FOR XML PATH('')), 2, 4000) AS 'Activities' FROM Contact AS p INNER JOIN Task AS t ON p.contactid = t.regardingobjectid ORDER BY p.contactid
This ugly looking creature is what I used to use before PIVOT came along, using many, many multiple self-joins. I'm pretty sure I had a slightly more elegant (and faster!) version of this, but it's been a long time since I've had to create one of these things (fortunately). The performance is...not as bad as you might expect.
SELECT 'ID' = p.contactid, 'Name' = p.fullname , 'Activity1' = a1.a422_rel_activityvalueidname , 'ActivityDate1' = a1.actualend , 'Activity2' = a2.a422_rel_activityvalueidname , 'ActivityDate2' = a2.actualend , 'Activity3' = a3.a422_rel_activityvalueidname , 'ActivityDate3' = a3.actualend , 'Activity4' = a4.a422_rel_activityvalueidname , 'ActivityDate4' = a4.actualend , 'Activity5' = a5.a422_rel_activityvalueidname , 'ActivityDate5' = a5.actualend FROM Contact AS p INNER JOIN Task AS a1 ON p.contactid = a1.regardingobjectid LEFT JOIN Task AS not1 ON p.contactid = not1.regardingobjectid AND not1.activityid < a1.activityid LEFT JOIN Task AS a2 ON p.contactid = a2.regardingobjectid AND a2.activityid > a1.activityid LEFT JOIN Task AS not2 ON p.contactid = not2.regardingobjectid AND not2.activityid > a1.activityid AND not2.activityid < a2.activityid LEFT JOIN Task AS a3 ON p.contactid = a3.regardingobjectid AND a3.activityid > a2.activityid LEFT JOIN Task AS not3 ON p.contactid = not3.regardingobjectid AND not3.activityid > a2.activityid AND not3.activityid < a3.activityid LEFT JOIN Task AS a4 ON p.contactid = a4.regardingobjectid AND a4.activityid > a3.activityid LEFT JOIN Task AS not4 ON p.contactid = not4.regardingobjectid AND not4.activityid > a3.activityid AND not4.activityid < a4.activityid LEFT JOIN Task AS a5 ON p.contactid = a5.regardingobjectid AND a5.activityid > a4.activityid LEFT JOIN Task AS not5 ON p.contactid = not5.regardingobjectid AND not5.activityid > a4.activityid AND not5.activityid < a5.activityid WHERE not1.regardingobjectid Is Null AND not2.regardingobjectid Is Null AND not3.regardingobjectid Is Null AND not4.regardingobjectid Is Null AND not5.regardingobjectid Is Null
Using a recursive CTE almost works, except that for each main record it gives a row with one subrecord, another row with two subrecords, a row with three subrecords, and so on for as many subrecords as are available for that main record. It seems like there has to be a way around that, so if you have any ideas, let me know. Performance is not good, not horrible.
WITH cte AS (SELECT a1.regardingobjectid, a1.activityid , 'Activities' = CONVERT(nvarchar(1000), a1.createdon, 113) FROM Task AS a1 INNER JOIN Contact AS p ON a1.regardingobjectid = p.contactid LEFT JOIN Task AS not1 ON a1.regardingobjectid = not1.regardingobjectid AND a1.activityid > not1.activityid WHERE not1.activityid Is Null UNION ALL SELECT cte.regardingobjectid, a1.activityid , 'Activities' = CONVERT(nvarchar(1000), (cte.Activities + N', ' + CONVERT(nvarchar, a1.createdon, 113))) FROM cte INNER JOIN Task AS a1 ON cte.regardingobjectid = a1.regardingobjectid AND cte.activityid < a1.activityid WHERE NOT EXISTS (SELECT * FROM Task AS not1 WHERE cte.regardingobjectid = not1.regardingobjectid AND not1.activityid > cte.activityid AND not1.activityid < a1.activityid) ) SELECT 'ID' = p.contactid, 'Name' = p.fullname , cte.Activities FROM cte INNER JOIN Contact AS p ON cte.regardingobjectid = p.contactid ORDER BY p.fullname
Creating a custom aggregate function in CLR is another solution, but playing with that will have to be another day.
Source: http://sqlserver.livejournal.com/76055.html
|