|
Posted by Erland Sommarskog on 04/28/07 09:04
B D Jensen (bjorn.d.jensen@gmail.com) writes:
> I have an big table with 50Gb data, written som functions
> that cleanup data and want to do something like this
>
> insert into newtable
> select id, func1(col1), func2(col2)
> from oldtable;
>
> I'm also plan to make newtable partioned (before insert).
>
> But how could i get the insert as fast as possible?
Exactly what is in those functions? Do they perform data access? Are
they written in T-SQL or in the CLR? I ask, because they could have a
great impact on performance.
Apart from that, there are a couple of possible strategies for this
situation. One is SELECT INTO, but since you plan to make the new
table partitioned, I don't think SELECT INTO is good for this. (SELECT
INTO creates a new table.)
Another is to use BCP to first unload the table to a file. You would
then use queryout or a view with your functions, so what get on file
is the cleaned-up version. Then you use BCP to load the data into the
new table. The key here is that there should be no indexes on the table
and it should be empty. In this case the bulk-load is minimally logged.
Of course, you also need to account for the time it takes to create
the indexes.
And the final option is to use a plain INSERT. But a single INSERT
statement will not be good for your transaction log. It's better to
batch and insert, say, 100000 rows at a time, preferrably with the database
set to simple recovery. You should batch on the clustered index of
the old table:
SELECT @start = 1
WHILE EXISTS (SELECT * FROM oldtable WHERE clustercol >= @start)
BEGIN
INSERT newtable (...)
SELECT ...
FROM oldtable
WHERE clustercol >= @start AND clustercol < @start - 100000
SELECT @start = @start + 100000
END
Here the actual increment would depend on the nature of your clustered
key. If it's a date, maybe taking one month at a time is a good idea.
If new the table will have the same clustered index as the old table,
have the clustered index in place when run the above, but wait with
adding non-clustered indexes until you have the data in place.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
[Back to original message]
|