Indexing Documents

    Date: 05/04/06 (IT Professionals)    Keywords: web

    OK, here is the scenario...

    We are looking to create a searchable file repository on our internal web server (intranet) to replace an out-of-control network share.

    File types are all over the grid, but the predominant ones are pdf, doc, xls and ppt. There also some others, but they are not as important.

    Is there a reliable way to extract the text to allow it to be searched? I have found various products that accomplish this, but these can get quite pricey or are platform specific. What I am looking for is a methodology that can be utilized regardless of the platform.

    I don't want to rely on system utilities like htdig or the IIS Index tool. I can open doc and xls and see the text, so I could potentially search them, but there is no discernible text in the pdf file I tried.



« The nature of SMTP virus... || Win32 fuser alternative »

antivirus | apache | asp | blogging | browser | bugtracking | cms | crm | css | database | ebay | ecommerce | google | hosting | html | java | jsp | linux | microsoft | mysql | offshore | offshoring | oscommerce | php | postgresql | programming | rss | security | seo | shopping | software | spam | spyware | sql | technology | templates | tracker | virus | web | xml | yahoo | home