Spider/crawl large sites

    Date: 02/23/08 (WebDesign)    Keywords: web, google

    Hi, I need to create a compliant sitemap for a very large website and continually run into timeout issues. I assume I need to break the job into several sub-sitemaps and then merge into one later. Google's sitemap_gen.py doesn't seem to let me select a set of files to index, so I would have to manually exclude hundreds of filenames which would take forever.

    Are there other sitemapping tools available which allow for specific pages to be indexed (one page handles content for thousands of sub-pages on this site)?

    Source: http://community.livejournal.com/webdesign/1363529.html

« CSS background image in FF || Page Test, anyone? »


antivirus | apache | asp | blogging | browser | bugtracking | cms | crm | css | database | ebay | ecommerce | google | hosting | html | java | jsp | linux | microsoft | mysql | offshore | offshoring | oscommerce | php | postgresql | programming | rss | security | seo | shopping | software | spam | spyware | sql | technology | templates | tracker | virus | web | xml | yahoo | home