Reply to Re: Reading PDF Headings and Page Numbers using PHP — PHP Programming Language

Posted by Peter Frost on 01/04/08 16:14

petejones@jeepstone.co.uk wrote:
> I have a directory of PDF files which contain Headings/Sub Headings
> and Page Numbers. I wish to write a script to open the PDF, read the
> Headings and any sub headings and write them out to a file. I want to
> do this to create some meta files (.pdf.desc). Most libraries that
> I've seen give the methods to write the headings but not read them.
> How can I do this?
>
> Thanks
>
> Pete

Good luck...

I tried to do something similar last year (I wanted to pull out just the
main body of the text, without headings, images, page numbers etc.). I'm
afraid that even though I searched for a long time I was unable to find
any libraries that would do this sort of thing. In the end, I downloaded
the PDF spec and rolled my own code. The spec is quite large but it's
fairly well-written so you may be able to pick out just the bits you
need to implement. It took me about a week to read through the document
and write my code, but if you're an experienced developer (I'm not!)
then no doubt you'll be able to do it quicker than that.

Peter

[Back to original message]

Удаленная работа для программистов • Как заработать на Google AdSense • England, UK • статьи на английском • PHP MySQL CMS Apache Oscommerce • Online Business Knowledge Base • DVD MP3 AVI MP4 players codecs conversion help

Home • Search • Site Map • Set as Homepage • Add to Favourites

Сайт изготовлен в Студии Валентина Петручека —
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация