|
Posted by E.T. Grey on 02/04/06 06:29
NC wrote:
> E.T. Grey wrote:
>
>>I have a (LARGE) set of historical data that I want to keep
>>on a central server, as several separate files.
>
>
> How large exactly?
At last count, there are about 65,000 distinct files (and increasing)
>
>
>>I want a client process to be able to request the data in a
>>specific file by specifying the file name, start date/time and
>>end date/time.
>
>
> The start/end date/time bit actually is a rather fat hint that you
> should consider using a database... Searching through large files will
> eat up enormous amounts of disk and processor time.
>
Not necessarily true. Each file has the equivalent of approx 1M rows
(yes - thats 1 million) - yet the binary files (which use compression
algos) are approx 10k-15K in size. If you multiply the number of rows
(on avg) by the number of files - you can quickly see why using a db as
a repository would be a poor design choice.
>
>>New data will be appended to these files each day, by a
>>(PHP) script.
>
>
> Yet another reason to consider a database...
>
>
See above
>>What is the best (i.e. most efficient and fast way) to transfer data
>>from the server to clients ?.
>
>
> Assuming you are using HTTP, compressed (gzip) CSV will probably be the
> fastest.
>
>
This involves converting the read data to a string first, before
(possibly) zipping it and sending it. This incurrs overhead (that I
would like to avoid) on both server and client.
>>How can I insure that that the (binary?) data sent from the Unix server
>>can be correctly interpreted at the client side?
>
>
> Why should the data be binary? Compressed CSV is likely to be at least
> as compact as binary data, plus CSV will be human-readable, which
> should help during debugging.
>
>
See above
>>How can I prevent clients from directly accessing the files
>>(to prevent malicious or accidental corruption of the data files.?
>
>
> Import them into a database and lock the originals in a safe place.
>
> Cheers,
> NC
>
[Back to original message]
|