Posted by Jonathan on 08/11/06 07:55
smartestdesign@gmail.com wrote:
> I am developing a program to crawl a site( looks like craigslist ).
> Since they have more than 20,000 entries I have to go to each
> categories
> site
> , parse with regular expression and extract data to database. This data
> will
> be updated every two days.
>
> The program i am analyzing now is that I have a number of clients site
> running
> on the same machine and if my program occupies the cpu usages( more
> than
> 80% )
> web server might hang and won't accept any connection from outside
> until I
> reboot
> my server.
>
> I came up with some idea to reduce process overhead.
> 1. go to the site and download all sites without parsing.
> 2. once all sites have been downloaded to local starts parsing.
> 3. save all data in a database.
>
> if any has a better idea let me know.
>
> SK
If you can get access to the database you are better of replicating this
database... but I guess that is not an option...
Jonathan
[Back to original message]
|