You are here: Re: [PHP] reading PDF's « PHP « IT news, forums, messages
Re: [PHP] reading PDF's

Posted by "Richard Lynch" on 07/01/05 04:11

On Fri, June 24, 2005 12:10 pm, Jon said:
> Is it possible to read text from a PDF file with PHP? How?

At the crudest level, you can fopen/fread a PDF and dump it out, and pick
out the plain-text readable bits with your eyes. :-)

After that, there are definitely some commercial command-line tools to
convert PDF to text (or HTML or whatever) that you can Google for.

There may be a free one, or even an OpenSource one, but I've never heard
of it, possibly because they'd have to pay a license to Adobe (Macromedia
this week?) to be legal...

Note that PDFs can have the text encrypted, or password-protect the PDF,
or the text could have been rendered into an image which was embedded in
the PDF (ugh!).

At that point, you can maybe get the image out and use some kind of OCR
softare like OmniPage to "read" it.

Over the years and versions the PDF changed a lot, so be sure to have a
representative sample of PDFs to throw at your testing.

You don't want to get to launch and find out 90% of the real PDFs simply
don't work. :-(

--
Like Music?
http://l-i-e.com/artists.htm

 

Navigation:

[Reply to this message]


Удаленная работа для программистов  •  Как заработать на Google AdSense  •  England, UK  •  статьи на английском  •  PHP MySQL CMS Apache Oscommerce  •  Online Business Knowledge Base  •  DVD MP3 AVI MP4 players codecs conversion help
Home  •  Search  •  Site Map  •  Set as Homepage  •  Add to Favourites

Copyright © 2005-2006 Powered by Custom PHP Programming

Сайт изготовлен в Студии Валентина Петручека
изготовление и поддержка веб-сайтов, разработка программного обеспечения, поисковая оптимизация