|
Posted by E.T. Grey on 02/06/06 15:46
noone wrote:
> NC wrote:
>
>> E.T. Grey wrote:
>>
>>>>> I have a (LARGE) set of historical data that I want to keep
>>>>> on a central server, as several separate files.
>>>>
>>>>
>>>>
>>>> How large exactly?
>>>
>>>
>>> At last count, there are about 65,000 distinct files (and increasing)
>>
>>
>> ...
>>
>>> Each file has the equivalent of approx 1M rows (yes - thats 1 million)
>>
>>
>> ...
>>
>>> If you multiply the number of rows (on avg) by the number of files -
>>> you can quickly see why using a db as a repository would be a
>>> poor design choice.
>>
>>
>>
>> Sorry, I can't. 65 million records is a manageable database.
>
>
> I agree... I have designed and deployed binary and ascii data loads in
> excess of 250Million records/day. Searching the data was a piece of
> cake - if you know how to actually designed the database correctly.
>
> 65M records is peanuts to a database - even MySql. With proper indexing
> you can do a direct-row lookup in < 4-8 I/O's - not so with the path
> you are currently trying to traverse... you are looking at up to 65M
> reads - and reads are very expensive!!
>
> Use the proper tools/mechanisms for the job at hand...
>
> Michael Austin
> DBA
> (stuff snipped)
>
Please do not patronise me. Like NC, you completely overlooked the
obvious fact that the number of records we are talking about (if a
database design is used) runs into billions - not millions. Furthermore,
the datasets are time series data and therefore order is of paramount
importance. Instead of trying to impose a design on me (without fully
understanding the problem), it would have been infinetely preferable if
you had simply answered the question I had asked in the first place. But
judging by the way you have overlooked basic facts - whilst being hell
bent that a db solution is *definitely* the way forward - you have
instantly lost any credibility you may have had - and consequently, I
will ignore any "advice" you care to offer in the future.
[Back to original message]
|