|
Posted by Peter Nolan on 05/16/07 17:00
Hi Araron,
"What can it do that you can't do in PL/SQL? Well, some nice things
are
we can parallelise the processing of large numbers of fact records
and
we can put the dimension tables in memory mapped IO and access them
in
a shared fashion using binary search......this is 10x faster than
doing the same in PL/SQL at runtime.... "
The most cpu intensive task in building a dimensional model is the
tranlsation of real keys into integer keys...the attribution process.
Doing this work in PL/SQL or inside the database as ELT type tools do
consumes much more (10x or more) processing power than is required if
you do it more efficiently.....and since this is the most expensive
bit, it's worth speeding up in large accounts...
Yes, of course, you can run multiple pl/sql statements at the same
time to do this work if you would like.
However, one thing we have been doing for about 7 years how is putting
a unique key on the front of fact table records. And when processing a
single set of transactions that have been split into different files
to enable parallelism you have to somehow allow for the allocation of
keys for these unqiue ids on the front of the fact records...this
means a semaphor at the file level, locks etc, so the pl/sql to handle
that becomes more difficult...especially when you consider you cannot
just allocate one key at a time and do the update because it slows
things down too much. You must be able to allocate blocks of
keys.......the ETL tools have these features in place because they are
so necessary.
Many people say to me, well, just let the database allocate a sequence
number....but in most databases these are slow....and they also give
you trouble if you want to delete and reload data because it is often
not so easy to reset the sequence number so that the delete/reloaded
records get the same sequence number they used to have......
We also do things like...when we need to update records it is often
faster to delete them and then to use the loader to load the record
rather than do an update. In these cases the sequence numbers must be
respected so using the database to allocate them does not work if you
want to do a delete/load rather than an update.
And yes, PL/SQL can do a good part of all this if you want to write
it...but it is far from trivial......and it will run 10x or more
slower than doing it properly....and that is well worth the effort in
most sizable accounts.....plenty of DW projects fail because the ETL
subsystem is too slow, too expensive, and too difficult to change.
Best Regards
Peter
On May 4, 10:13 pm, "Aaron Kempf" <ake...@dol.wa.gov> wrote:
> what you can't run multiple PL/SQL statements at the same time in Oracle?
>
> ROFL
>
> "Peter Nolan" <p...@peternolan.com> wrote in message
>
> news:1177319493.688345.26010@o5g2000hsb.googlegroups.com...
>
>
>
> > Hi DBA...
> > your append is exactly what I have been talking about since the mid
> > 90s as well....making ETL easier......
>
> > We have invented the future and the future of ETL is 'generated ETL
> > from the data mapping workbook'. (www.instantbi.com)
>
> > You have to do your data mapping somehow, and excel is how most people
> > do it, the laggards are still using word......
>
> > Since you already have to do your data mapping, and if you are
> > sensible you do it in excel, it makes the most sense to generate the
> > etl subsystem directly from the workbook as well as publish the
> > workbook via the web so that authorised people can see any and all
> > details of the ETL subsystem.
>
> > No ETL subsystem will ever be any easier to develop and deploy than
> > what we have invented because no ETL subsystem will ever be easier to
> > build than a direct generate from the mapping workbook.....this is the
> > 'end game' for development of ETL subsystems.
>
> > Why use such a tool rather than PL/SQL.....well, because it is
> > generated directly from the workbook we have 'done away with' the ETL
> > programmer.....and that is a good thing. I have done far too much ETL
> > programming over the years and I want to get rid of that complete
> > waste of time....
>
> > What can it do that you can't do in PL/SQL? Well, some nice things are
> > we can parallelise the processing of large numbers of fact records and
> > we can put the dimension tables in memory mapped IO and access them in
> > a shared fashion using binary search......this is 10x faster than
> > doing the same in PL/SQL at runtime....
>
> > Also, we have intelligence built into it that means you can do things
> > like add new summaries without any code changes, you can add new keys
> > to fact tables without any code changes, you can make lookups into
> > dimension tables to get new keys FAR more complex than possible than
> > via normal sql statements.
>
> > In short, we have eliminated all the 'coding' effort that is required
> > when writing you ETL subsystem no matter what the tool.....and we have
> > done it in such a way that it is as scalable as the operating system
> > underneath....
>
> > Another BIG feature is that the ETL subsystem is portable across
> > databases and operating systems....something that PL/SQL and SSIS are
> > obvioulsy not.....this means that if some better/faster database comes
> > out we can move to it......not something that MSFT would like to her
> > and this is their newsgroup.....but it has always been a belief of
> > mine that the ETL subsystem should be fully portable across OS and
> > database.....and surprisingly, this is NOT the case with any of the
> > ETL tools that I have seen....they all require quite some effort to
> > move them.....thereby creating a cost to adopt a faster/cheaper/better
> > database.
>
> > If you are keenly interested, feel free to check my personal site
> >www.peternolan.comwhere I have published vast amounts of code and
> > documentation on ETL subsystems.
>
> > Best Regards
>
> > Peter
> >www.peternolan.com- Hide quoted text -
>
> - Show quoted text -
Navigation:
[Reply to this message]
|