Query fun and games

    Date: 02/23/12 (SQL Server)    Keywords: xml, sql

    I've found in general for SQL that there is more than one way to solve (almost) any problem. I've been playing around with query building today and decided to see how many ways I could solve a problem that recurs fairly frequently in my work, flattening subrecords into a single row.

    This is my current standard solution, using the PIVOT function. It's quite fast, but limits you to a specific number of subrecords--it can be a high number, but you still have to decide on a maximum.

    WITH cte AS (SELECT Person.contactid AS 'ID' , Person.FullName AS 'Name'
    , 'Activity' = Activity.a422_rel_activityvalueidname
    , 'Row' = ROW_NUMBER() OVER (PARTITION BY Person.contactid, Person.FullName ORDER BY Activity.a422_rel_activityvalueidname)
    FROM Contact AS Person
    INNER JOIN Task AS Activity ON Person.contactid = Activity.regardingobjectid)
    SELECT ID, Name
    , 'Activity1' = [1], 'Activity2' = [2], 'Activity3' = [3], 'Activity4' = [4], 'Activity5' = [5]
    FROM cte
    PIVOT (MAX(cte.Activity) FOR cte.[Row] IN ([1], [2], [3], [4], [5])) AS pvt

    This is a new solution I found in surfing some SQL Server blogs, using FOR XML PATH to create a CSV list of values. It will include an indefinite number of subrecords, but only includes one field from the subrecords. It's significantly slower than the first example by at least an order of ten.
    SELECT DISTINCT p.contactid AS 'ID' , p.FullName AS 'Name'
    , SUBSTRING((SELECT ', ' + Activity.a422_rel_activityvalueidname
    FROM task AS Activity
    WHERE Activity.regardingobjectid = p.contactid
    FOR XML PATH('')), 2, 4000) AS 'Activities'
    FROM Contact AS p
    INNER JOIN Task AS t ON p.contactid = t.regardingobjectid
    ORDER BY p.contactid

    This ugly looking creature is what I used to use before PIVOT came along, using many, many multiple self-joins. I'm pretty sure I had a slightly more elegant (and faster!) version of this, but it's been a long time since I've had to create one of these things (fortunately). The performance is...not as bad as you might expect.
    SELECT 'ID' = p.contactid, 'Name' = p.fullname
    , 'Activity1' = a1.a422_rel_activityvalueidname
    , 'ActivityDate1' = a1.actualend
    , 'Activity2' = a2.a422_rel_activityvalueidname
    , 'ActivityDate2' = a2.actualend
    , 'Activity3' = a3.a422_rel_activityvalueidname
    , 'ActivityDate3' = a3.actualend
    , 'Activity4' = a4.a422_rel_activityvalueidname
    , 'ActivityDate4' = a4.actualend
    , 'Activity5' = a5.a422_rel_activityvalueidname
    , 'ActivityDate5' = a5.actualend
    FROM Contact AS p
    INNER JOIN Task AS a1
    ON p.contactid = a1.regardingobjectid
    LEFT JOIN Task AS not1
    ON p.contactid = not1.regardingobjectid
    AND not1.activityid < a1.activityid
    LEFT JOIN Task AS a2
    ON p.contactid = a2.regardingobjectid
    AND a2.activityid > a1.activityid
    LEFT JOIN Task AS not2
    ON p.contactid = not2.regardingobjectid
    AND not2.activityid > a1.activityid
    AND not2.activityid < a2.activityid
    LEFT JOIN Task AS a3
    ON p.contactid = a3.regardingobjectid
    AND a3.activityid > a2.activityid
    LEFT JOIN Task AS not3
    ON p.contactid = not3.regardingobjectid
    AND not3.activityid > a2.activityid
    AND not3.activityid < a3.activityid
    LEFT JOIN Task AS a4
    ON p.contactid = a4.regardingobjectid
    AND a4.activityid > a3.activityid
    LEFT JOIN Task AS not4
    ON p.contactid = not4.regardingobjectid
    AND not4.activityid > a3.activityid
    AND not4.activityid < a4.activityid
    LEFT JOIN Task AS a5
    ON p.contactid = a5.regardingobjectid
    AND a5.activityid > a4.activityid
    LEFT JOIN Task AS not5
    ON p.contactid = not5.regardingobjectid
    AND not5.activityid > a4.activityid
    AND not5.activityid < a5.activityid
    WHERE not1.regardingobjectid Is Null
    AND not2.regardingobjectid Is Null
    AND not3.regardingobjectid Is Null
    AND not4.regardingobjectid Is Null
    AND not5.regardingobjectid Is Null

    Using a recursive CTE almost works, except that for each main record it gives a row with one subrecord, another row with two subrecords, a row with three subrecords, and so on for as many subrecords as are available for that main record. It seems like there has to be a way around that, so if you have any ideas, let me know. Performance is not good, not horrible.
    WITH cte AS (SELECT a1.regardingobjectid, a1.activityid
    , 'Activities' = CONVERT(nvarchar(1000), a1.createdon, 113)
    FROM Task AS a1
    INNER JOIN Contact AS p
    ON a1.regardingobjectid = p.contactid
    LEFT JOIN Task AS not1
    ON a1.regardingobjectid = not1.regardingobjectid
    AND a1.activityid > not1.activityid
    WHERE not1.activityid Is Null
    SELECT cte.regardingobjectid, a1.activityid
    , 'Activities' = CONVERT(nvarchar(1000), (cte.Activities + N', ' + CONVERT(nvarchar, a1.createdon, 113)))
    FROM cte
    INNER JOIN Task AS a1
    ON cte.regardingobjectid = a1.regardingobjectid
    AND cte.activityid < a1.activityid
    FROM Task AS not1
    WHERE cte.regardingobjectid = not1.regardingobjectid
    AND not1.activityid > cte.activityid
    AND not1.activityid < a1.activityid)
    SELECT 'ID' = p.contactid, 'Name' = p.fullname
    , cte.Activities
    FROM cte
    INNER JOIN Contact AS p
    ON cte.regardingobjectid = p.contactid
    ORDER BY p.fullname

    Creating a custom aggregate function in CLR is another solution, but playing with that will have to be another day.

    Source: http://sqlserver.livejournal.com/76055.html

« Data warehouse community || Stripping time out of... »

antivirus | apache | asp | blogging | browser | bugtracking | cms | crm | css | database | ebay | ecommerce | google | hosting | html | java | jsp | linux | microsoft | mysql | offshore | offshoring | oscommerce | php | postgresql | programming | rss | security | seo | shopping | software | spam | spyware | sql | technology | templates | tracker | virus | web | xml | yahoo | home