Reply to Challenge: Can you optimize this? (summary) — MsSQL Server

Posted by octangle on 06/07/06 17:54

Below is an aggregate script that includes everyone's suggested queries
so far...

Based upon feedback I have beefed up the test records to more
accurately reflect all of the potential scenarios that need to be
handled by this quey.

The original query (Query attempt #1 (Octangle)) generates the desired
result set and therefore is the benchmark of correctness for my
purposes.

MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW

/* Set up */

CREATE TABLE t1(RegJrnID INTEGER, InvoiceDate VARCHAR(8), InvoiceNumber
VARCHAR(20), TaxableAmount DECIMAL(32,8))

INSERT INTO t1 VALUES (0, '20060120', '0000033', 0.00)
INSERT INTO t1 VALUES (1, '20060101', '2321323', 100.00)
INSERT INTO t1 VALUES (9, '20060213', '2130009', 40.01)
INSERT INTO t1 VALUES (11, '20060324', '3321110', -1200.16)
INSERT INTO t1 VALUES (3, '20060101', '9402293', 512.44)
INSERT INTO t1 VALUES (1, '20060104', '2321323', -100.00)
INSERT INTO t1 VALUES (13, '20051127', '1034501', -77.50)
INSERT INTO t1 VALUES (4, '20060105', '9302221', 612.12)
INSERT INTO t1 VALUES (5, '20060105', '0003235', 18.11)
INSERT INTO t1 VALUES (10, '20060421', '0000033', 0.00)
INSERT INTO t1 VALUES (6, '20060111', '5953432', 2101.21)
INSERT INTO t1 VALUES (3, '20060111', '9402293', -512.44)
INSERT INTO t1 VALUES (12, '20060606', '0000001', 4431.55)
INSERT INTO t1 VALUES (7, '20060115', '4234444', 44.52)
INSERT INTO t1 VALUES (8, '20060115', '0342222', 95.21)
INSERT INTO t1 VALUES (6, '20060119', '5953432', -2101.21)
INSERT INTO t1 VALUES (2, '20060101', '5440033', 231.01)
INSERT INTO t1 VALUES (10, '20060517', '0000033', 0.00)
INSERT INTO t1 VALUES (11, '20060324', '3321110', 1200.16)
INSERT INTO t1 VALUES (12, '20060606', '0000001', -4431.55)

/* Show what's in the table */

SELECT * FROM t1 ORDER BY RegJrnID, InvoiceDate

/* Query for records to reverse */

/* Query attempt #1 (Octangle) */

/* Pros: correct */
/* Cons: slow */

SELECT *
FROM t1 a

/* Ignore records that have already been reversed */

WHERE a.RegJrnID != ALL

/* This subselect finds reversed records (i.e. those that have a
duplicate RegJrnID) */

(
SELECT b.RegJrnID
FROM t1 b
GROUP BY b.RegJrnID
HAVING COUNT(*) > 1
)

/* User selection criteria are appended here */

/* AND InvoiceNumber >= '5000000' AND InvoiceNumber <= '7500000' */

/*ORDER BY RegJrnID; * Make the results look pretty (optional) */

/* Query attempt #2 (Alexander) */

/* Pros: faster */
/* Cons: misses 0 TaxableAmounts */

SELECT pos.*
FROM
(
SELECT * FROM t1 WHERE TaxableAmount > 0
) pos
LEFT OUTER JOIN
(
SELECT * FROM t1 WHERE TaxableAmount < 0
) neg
ON pos.RegJrnID = neg.RegJrnID
WHERE neg.RegJrnID IS NULL
/*ORDER BY pos.RegJrnID * Make the results look pretty (optional) */

/* Query attempt #3 (Alexander - tweaked by Octangle) */

/* Pros: faster */
/* Cons: finds too many 0 TaxableAmounts */

SELECT pos.*
FROM
(
SELECT * FROM t1 WHERE TaxableAmount >= 0
) pos
LEFT OUTER JOIN
(
SELECT * FROM t1 WHERE TaxableAmount < 0
) neg
ON pos.RegJrnID = neg.RegJrnID
WHERE neg.RegJrnID IS NULL
/*ORDER BY pos.RegJrnID * Make the results look pretty (optional) */

/* Query attempt #4 (Hugo) */

/* Pros: correct , fastest, returns results in RegJrnID order with
ORDER BY clause */

SELECT RegJrnID, MAX(InvoiceDate) as "InvoiceDate",
MAX(InvoiceNumber) as "InvoiceNumber", MAX(TaxableAmount) as
"TaxableAmount"
FROM t1
GROUP BY RegJrnID
HAVING COUNT(*) = 1

/* Query attempt #5 (Hugo) */

/* Pros: fast */
/* Cons: not correct */

SELECT RegJrnID, InvoiceDate, InvoiceNumber, TaxableAmount
FROM t1 AS a
WHERE NOT EXISTS
(
SELECT *
FROM t1 AS b
WHERE a.RegJrnID = b.RegJrnID
AND a.InvoiceDate <> b.InvoiceDate
)
/*ORDER BY RegJrnID * Make the results look pretty (optional) */

/* Housekeeping */

DROP TABLE t1

MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW

Queries as percent of batch (when just executing the queries in the
above script)

Query #1: 22.66% - Correct
Query #2: 19.77%
Query #3: 20.27%
Query #4: 12.63% - Correct
Query #5: 20.67%

Queries as percent when compared to only the original query (Query #1)

Query #1: 50.00% - Correct
Query #2: 42.58%
Query #3: 43.19%
Query #4: 32.14% - Correct
Query #5: 43.67%

At this point it looks like the clear winner is Query #4 by Hugo!

MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW

To address some of the observations/comments:

1. Negative transactions are possible - I augmented the test data to
include this case.
2. This is for a commercial product that has numerous existing
customers, I inherited the data model that this table is based upon...
my coding constraits are:
- I cannot add any columns (due to how we version a column change would
force this release to be considered a major release and not a minor
release as desired)
- I should not add any indexes/primary keys/uniqueness constriants for
performance reasons (see below)

The purpose of this table to store processed transaction results. It
needs to be as efficient as possible for insertions, so as to not slow
down the transaction processing engine. Reporting (and reversing groups
of transactions) are secondary concerns and it is acceptable for these
functions to be slower.

I sincerely want to thank everyone who chipped in a comment or
suggestion on this...

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация