|
Posted by Rory Browne on 08/25/05 21:05
At the risk of making a complete and utter ass of myself, I'm going to
disagree with Richard.
I'm going to justify this, by the fact that file_get_content function
is written in C, and performs function required, that is currently
performed by wget.
On 8/25/05, Michelle Konzack <linux4michelle@freenet.de> wrote:
> Hello,
>
> Curently I do it with wget and by hand using a bash script,
> but like to integrate it into my php4 webinterface.
>
> What I need is:
>
> 1) INPUT-Form where I can type the URL of
> a html/php (or something like this) page.
I assume you know the html to create a web form, and how to use the
$_GET and $_POST variables. If not, go learn php, and then read the
rest of my reply.
>
> when submited,
>
> 2) the php script download the page and create an md5sum
Assuming that allow-url-fopen is enabled you can
$content = file_get_contents($url);
$md5hash = md5($content);
> 3) look in a database where it check the whole URL wheter
> it is already there and if
> YES check the md5sum
What DB are you using?
> 3.1) if equal drop the URL and stop here
> 3.2) if different calculate original md5sum
> and insert it into database
> NO calculate original md5sum and insert it into database
>
> up to here it is working fine.
>
> 4) now get all FULL URIs from the page requisites
>
> *PAFF*
>
> How can this be done ?
>
> Please note, that the files should be renamed to md5-hashes and
> reinseted into the original page. Then safed all files into ONE
> directory with names as md5-hashes.
>
> Note: I am talking about (curently) 127.000.000 files.
> It is curently in a Raid-5 with 7 x 147 GByte but because
> a major upgrade of Hardware to 15 x 300 GByte the number
> of files will increase
>
> Curently I do not know, whether I should use ONE Raid with
> 15 HDDs, TWO with 7 HDDs, three with 5 HDDs or 5 with 3 HDDs.
>
> Maybe I will run into a performance problems with the Inodes
> which I already have... (I think)
>
> Greetings
> Michelle
>
> --
> Linux-User #280138 with the Linux Counter, http://counter.li.org/
> Michelle Konzack Apt. 917 ICQ #328449886
> 50, rue de Soultz MSM LinuxMichi
> 0033/3/88452356 67100 Strasbourg/France IRC #Debian (irc.icq.com)
>
>
>
[Back to original message]
|