|
Posted by Peter Frost on 01/04/08 16:14
petejones@jeepstone.co.uk wrote:
> I have a directory of PDF files which contain Headings/Sub Headings
> and Page Numbers. I wish to write a script to open the PDF, read the
> Headings and any sub headings and write them out to a file. I want to
> do this to create some meta files (.pdf.desc). Most libraries that
> I've seen give the methods to write the headings but not read them.
> How can I do this?
>
> Thanks
>
> Pete
Good luck...
I tried to do something similar last year (I wanted to pull out just the
main body of the text, without headings, images, page numbers etc.). I'm
afraid that even though I searched for a long time I was unable to find
any libraries that would do this sort of thing. In the end, I downloaded
the PDF spec and rolled my own code. The spec is quite large but it's
fairly well-written so you may be able to pick out just the bits you
need to implement. It took me about a week to read through the document
and write my code, but if you're an experienced developer (I'm not!)
then no doubt you'll be able to do it quicker than that.
Peter
Navigation:
[Reply to this message]
|