|
Posted by Merennulli on 02/09/07 15:36
On Feb 8, 7:27 pm, "Steve" <morrisz...@hotmail.com> wrote:
> On Feb 8, 4:05 pm, "Merennulli" <mar...@sdf.lonestar.org> wrote:
>
>
>
>
>
> > To start with, I'll give a simplified overview of my data.
>
> > BaseRecord (4mil rows, 25k in each Region)
> > ID | Name | Region | etc
>
> > OtherData (7.5mil rows, 1 or 2 per ID)
> > ID | Type(1/2) | Data
>
> > ProblemTable (4mil rows)
> > ID | ConcatenatedHistory
>
> > The concatenated history field is a nvarchar with up to 20 different
> > pipe delimited date/code combinations, eg. '01/01/2007X|11/28/2006Q|
> > 11/12/2004Q|'
>
> > Using left outer joins (all from base, the rest optional) I've got a
> > view something like:
>
> > View (4mil rows)
> > ID | Name | Region | etc | Data | Data2 | ConcatenatedHistory
>
> > Querying it, it takes about 15-20 seconds to do this:
> > Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
>
> > Where Region = 58
> > and ConcatenatedHistory like '%11/28/2006%' or ConcatenatedHistory
> > like '%2007X%' ;
>
> > Or to do this:
> > Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
>
> > Where Region = 58;
>
> > But this takes over a minute:
> > Select ID, Name, Region, etc, Data, Data2, ConcatenatedHistory>From View
>
> > Where Region = 58
> > and ConcatenatedHistory like '%test%' ;
>
> > What puzzles me most is that it's taking longer to return nothing.
>
> > I've tried normalizing this concatenated field into it's own table, or
> > into 20 and 40 denormalized fields. The denormalized fields were
> > nightmarishly long queries for a web interface at 5-6 minutes.
>
> > The normalized table should have roughly 25mil records, but cutting it
> > down to just the most relevant years let me play with it at 9.5mil
> > records. This shifted the results to where it took 35-40 seconds to do
> > ANY query against that table.
>
> > Select View.ID, View.Name, View.Region, View.etc, View.Data,
> > View.Data2, History.Date, History.Code>From View inner join History on History.ID = View.ID
>
> > Where View.Region = 58
> > and History.Date = '11/28/2006' or History.Code = '2007X';
>
> > I also tried reducing this table down to a linking table between ID
> > and Code, and pushing the date off to another table, but that only
> > made things worse.
>
> > ~~~
> > Going back to what worked best (the intial View), the Execution Plan
> > shows 93% on a Clustered Index Scan on the ID field's index in the
> > ConcatenatedHistory table for the problem query, but spreads out the
> > load fairly evenly among indexes on the successful query. I'm trying
> > to figure out a way to improve performance, and more importantly, make
> > "0 records found" responses be a bit more forthcoming.
>
> > If it's relevant, I'm on SQL Server 2005 Standard, and I've already
> > taken care of the memory, CPU and drive optimization.
>
> do you query the concatenated data, "normalized", in a way that you
> could use FULL TEXT ?- Hide quoted text -
>
> - Show quoted text -
Steve,
The "normalized" data broke the code and date into a datetime field
and a 5 character char (eg. '2006X'). With that change, I was doing
exact matching instead of "like" comparisons. As far as my knowledge
extends, that should have more than compensated for the table size
difference (4mil vs 25mil and later vs 9mil). My first assumption was
that the indexes were wrong, but a non-clustered index on the ID and
Code fields should have been correct for this. I can try a clustered
on the pair, but I don't see where that would improve performance.
[Back to original message]
|