|
Posted by Richard Levasseur on 08/11/06 09:54
smartestdesign@gmail.com wrote:
> I am developing a program to crawl a site( looks like craigslist ).
> Since they have more than 20,000 entries I have to go to each
> categories
> site
> , parse with regular expression and extract data to database. This data
> will
> be updated every two days.
>
> The program i am analyzing now is that I have a number of clients site
> running
> on the same machine and if my program occupies the cpu usages( more
> than
> 80% )
> web server might hang and won't accept any connection from outside
> until I
> reboot
> my server.
>
> I came up with some idea to reduce process overhead.
> 1. go to the site and download all sites without parsing.
> 2. once all sites have been downloaded to local starts parsing.
> 3. save all data in a database.
>
> if any has a better idea let me know.
>
> SK
Try using the rss feeds.
Navigation:
[Reply to this message]
|