|
Posted by Dan Guzman on 05/26/06 05:37
> With the addition of the new CLR functionality, I
> suspect that MS may be giving us developers enough rope to comfortably
> hang ourselves if we're not careful.
Very true. Once size doesn't fit all so one needs to be mindful of the best
tool for the job. The general guideline is that Transact-SQL is best for
data access while CLR is best for moderate to complex computations and
string manipulation.
I haven't played with CLR aggregate functions but it stands to reason that
native Transact-SQL would be faster when the most of the query cost is data
access. I suspect the CLR advantage would kick in when more complex
processing was needed.
--
Hope this helps.
Dan Guzman
SQL Server MVP
<billmiami2@netscape.net> wrote in message
news:1148605648.742461.307520@j55g2000cwa.googlegroups.com...
>I was playing around with the new SQL 2005 CLR functionality and
> remembered this discussion that I had with Erland Sommarskog concerning
> performance of scalar UDFs some time ago (See "Calling sp_oa* in
> function" in this newsgroup). In that discussion, Erland made the
> following comment about UDFs in SQL 2005:
>
>>>The good news is that in SQL 2005, Microsoft has addressed several of
> these issues, and the cost of a UDF is not as severe there. In fact for
> a complex expression, a UDF in written a CLR language may be faster
> than
> the corresponding expression using built-in T-SQL functions.<<
>
> I thought the I would put this to the test using some of the same SQL
> as before, but adding a simple scalar CLR UDF into the mix. The test
> involved querying a simple table with about 300,000 rows. The
> scenarios are as follows:
>
> (A) Use a simple CASE function to calculate a column
> (B) Use a simple CASE function to calculate a column and as a criterion
> in the WHERE clause
> (C) Use a scalar UDF to calculate a column
> (D) Use a scalar UDF to calculate a column and as a criterion in the
> WHERE clause
> (E) Use a scalar CLR UDF to calculate a column
> (F) Use a scalar CLR UDF to calculate a column and as a criterion in
> the WHERE clause
>
> A sample of the results is as follows (time in milliseconds):
>
> (295310 row(s) affected)
> A: 1563
>
> (150003 row(s) affected)
> B: 906
>
> (295310 row(s) affected)
> C: 2703
>
> (150003 row(s) affected)
> D: 2533
>
> (295310 row(s) affected)
> E: 2060
>
> (150003 row(s) affected)
> F: 2190
>
> The scalar CLR UDF function was significantly faster than the classic
> scalar UDF, even for this very simple function. Perhaps a more complex
> function would have shown even a greater difference. Based on this, I
> must conclude that Erland was right. Of course, it's still faster to
> stick with basic built-in functions like CASE.
>
> In another test, I decided to run some queries to compare built-in
> aggregates vs. a couple of simple CLR aggregates as follows:
>
> (G) Calculate averages by group using the built-in AVG aggregate
> (H) Calculate averages by group using a CLR aggregate that similates
> the built-in AVG aggregate
> (I) Calculate a "trimmed" average by group (average excluding highest
> and lowest values) using built-in aggregates
> (J) Calculate a "trimmed" average by group using a CLR aggregate
> specially designed for this purpose
>
> A sample of the results is as follows (time in milliseconds):
>
> (59 row(s) affected)
> G: 313
>
> (59 row(s) affected)
> H: 890
>
> (59 row(s) affected)
> I: 216
>
> (59 row(s) affected)
> J: 846
>
> It seems that the CLR aggregates came with a significant performance
> penalty over the built-in aggregates. Perhaps they would pay off if I
> were attempting a very complex type of aggregation. However, at this
> point I'm going to shy away from using these unless I can't find a way
> to do the calculation with standard SQL.
>
> In a way, I'm happy that basic SQL still seems to be the fastest way to
> get things done. With the addition of the new CLR functionality, I
> suspect that MS may be giving us developers enough rope to comfortably
> hang ourselves if we're not careful.
>
> Bill E.
> Hollywood, FL
> -----------------------------------------------------------------------
>
>
> -- table TestAssignment, about 300,000 rows
> CREATE TABLE [dbo].[TestAssignment](
> [TestAssignmentID] [int] NOT NULL,
> [ProductID] [int] NULL,
> [PercentPassed] [int] NULL,
> CONSTRAINT [PK_TestAssignment] PRIMARY KEY CLUSTERED
> (
> [TestAssignmentID] ASC
> )
>
>
> --Scalar UDF in SQL
> CREATE FUNCTION [dbo].[fnIsEven]
> (
> @intValue int
> )
> RETURNS bit
> AS
> BEGIN
> Declare @bitReturnValue bit
>
> If @intValue % 2 = 0
> Set @bitReturnValue=1
> Else
> Set @bitReturnValue=0
> RETURN @bitReturnValue
> END
>
> --Scalar CLR UDF
> /*
> using System;
> using System.Data;
> using System.Data.SqlClient;
> using System.Data.SqlTypes;
> using Microsoft.SqlServer.Server;
>
> public partial class UserDefinedFunctions
> {
>
> [Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,IsPrecise=true)]
> public static SqlBoolean IsEven(SqlInt32 value)
> {
> if(value % 2 == 0)
> {
> return true;
> }
> else
> {
> return false;
> }
> }
> };
> */
>
> --Test #1
> --Scenario A - Query with calculated column--
> SELECT TestAssignmentID,
> CASE WHEN TestAssignmentID % 2=0 THEN 1 ELSE 0 END AS
> CalcColumn
> FROM TestAssignment
>
> --Scenario B - Query with calculated column as criterion--
> SELECT TestAssignmentID,
> CASE WHEN TestAssignmentID % 2=0 THEN 1 ELSE 0 END AS
> CalcColumn
> FROM TestAssignment
> WHERE CASE WHEN TestAssignmentID % 2=0 THEN 1 ELSE 0 END=1
>
> --Scenario C - Query using scalar UDF--
> SELECT TestAssignmentID,
> dbo.fnIsEven(TestAssignmentID) AS CalcColumn
> FROM TestAssignment
>
> --Scenario D - Query using scalar UDF as crierion--
> SELECT TestAssignmentID,
> dbo.fnIsEven(TestAssignmentID) AS CalcColumn
> FROM TestAssignment
> WHERE dbo.fnIsEven(TestAssignmentID)=1
>
> --Scenario E - Query using CLR scalar UDF--
> SELECT TestAssignmentID,
> dbo.fnIsEven_CLR(TestAssignmentID) AS CalcColumn
> FROM TestAssignment
>
> --Scenario F - Query using CLR scalar UDF as crierion--
> SELECT TestAssignmentID,
> dbo.fnIsEven_CLR(TestAssignmentID) AS CalcColumn
> FROM TestAssignment
> WHERE dbo.fnIsEven(TestAssignmentID)=1
>
>
> --CLR Aggregate functions
> /*
> using System;
> using System.Data;
> using System.Data.SqlClient;
> using System.Data.SqlTypes;
> using Microsoft.SqlServer.Server;
>
>
> [Serializable]
> [Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
> public struct Avg
> {
> public void Init()
> {
> this.numValues = 0;
> this.totalValue = 0;
> }
>
> public void Accumulate(SqlDouble Value)
> {
> if (!Value.IsNull)
> {
> this.numValues++;
> this.totalValue += Value;
> }
>
> }
>
> public void Merge(Avg Group)
> {
> if (Group.numValues > 0)
> {
> this.numValues += Group.numValues;
> this.totalValue += Group.totalValue;
> }
>
> }
>
> public SqlDouble Terminate()
> {
> if (numValues == 0)
> {
> return SqlDouble.Null;
> }
> else
> {
> return (this.totalValue / this.numValues);
> }
> }
>
> // private accumulators
> private int numValues;
> private SqlDouble totalValue;
>
> }
>
> [Serializable]
> [Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)]
> public struct TrimmedAvg
> {
> public void Init()
> {
> this.numValues = 0;
> this.totalValue = 0;
> this.minValue = SqlDouble.MaxValue;
> this.maxValue = SqlDouble.MinValue;
> }
>
> public void Accumulate(SqlDouble Value)
> {
> if (!Value.IsNull)
> {
> this.numValues++;
> this.totalValue += Value;
> if (Value < this.minValue)
> this.minValue = Value;
> if (Value > this.maxValue)
> this.maxValue = Value;
> }
>
>
> }
>
> public void Merge(TrimmedAvg Group)
> {
> if (Group.numValues > 0)
> {
> this.numValues += Group.numValues;
> this.totalValue += Group.totalValue;
> if (Group.minValue < this.minValue)
> this.minValue = Group.minValue;
> if (Group.maxValue > this.maxValue)
> this.maxValue = Group.maxValue;
> }
>
> }
>
> public SqlDouble Terminate()
> {
> if (this.numValues < 3)
> return SqlDouble.Null;
> else
> {
> this.numValues -= 2;
> this.totalValue -= this.minValue;
> this.totalValue -= this.maxValue;
> return (this.totalValue / this.numValues);
> }
> }
>
> // private accumulators
> private int numValues;
> private SqlDouble totalValue;
> private SqlDouble minValue;
> private SqlDouble maxValue;
>
> }
> */
>
> --Test #2
>
> --Scenario G - Average Query using built-in aggregate--
> SELECT ProductID, Avg(Cast(PercentPassed AS float))
> FROM TestAssignment
> GROUP BY ProductID
> ORDER BY ProductID
>
> --Scenario H - Average Query using CLR aggregate--
> SELECT ProductID, dbo.Avg_CLR(Cast(PercentPassed AS float)) AS Average
> FROM TestAssignment
> GROUP BY ProductID
> ORDER BY ProductID
>
> --Scenario I - Trimmed Average Query using built in aggregates/set
> operations--
> SELECT A.ProductID,
> Case
> When B.CountValues<3 Then Null
> Else Cast(A.Total-B.MaxValue-B.MinValue AS
> float)/Cast(B.CountValues-2 As float)
> End AS Average
> FROM
> (SELECT ProductID, Sum(PercentPassed) AS Total
> FROM TestAssignment
> GROUP BY ProductID) A
>
> LEFT JOIN
>
> (SELECT ProductID,
> Max(PercentPassed) AS MaxValue,
> Min(PercentPassed) AS MinValue,
> Count(*) AS CountValues
> FROM TestAssignment
> WHERE PercentPassed Is Not Null
> GROUP BY ProductID) B
>
> ON A.ProductID=B.ProductID
> ORDER BY A.ProductID
>
> --Scenario J - Trimmed Average Query using CLR aggregate--
> SELECT ProductID, dbo.TrimmedAvg_CLR(Cast(PercentPassed AS real)) AS
> Average
> FROM TestAssignment
> GROUP BY ProductID
> ORDER BY ProductID
>
Navigation:
[Reply to this message]
|