Posted by smartestdesign on 08/11/06 07:42
I am developing a program to crawl a site( looks like craigslist ).
Since they have more than 20,000 entries I have to go to each
categories
site
, parse with regular expression and extract data to database. This data
will
be updated every two days.
The program i am analyzing now is that I have a number of clients site
running
on the same machine and if my program occupies the cpu usages( more
than
80% )
web server might hang and won't accept any connection from outside
until I
reboot
my server.
I came up with some idea to reduce process overhead.
1. go to the site and download all sites without parsing.
2. once all sites have been downloaded to local starts parsing.
3. save all data in a database.
if any has a better idea let me know.
SK
Navigation:
[Reply to this message]
|