|
Posted by Ned Baldessin on 09/19/05 17:11
Hi,
I need to perform full text searches on a batch of PDF and Word files.
What is the best way to go?
After some research, I'm thinking of extracting the plain text from the
files with "pdftotext" and "catdoc", hamonizing the various possible
encodings to UTF-8, storing the text in a MySQL database, and then
using the full text search capabilities of MySQL.
Do you think that would work well? I am told that the files are mostly
text and won't be longer than 30 pages.
Thanks.
--
My email address doesn't ride a horse.
Navigation:
[Reply to this message]
|