|
Posted by Shelly on 09/19/07 20:12
"Good Man" <heyho@letsgo.com> wrote in message
news:Xns99B0A095D3484sonicyouth@216.196.97.131...
> "Shelly" <sheldonlg.news@asap-consult.com> wrote in
> news:13f2ro925ga7teb@corp.supernews.com:
>
>> Any suggestions?
>>
>> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message
>> news:13f2f8uqm3eck19@corp.supernews.com...
>>>I had to do my first investigation regarding PDF files. Surprisingly,
>>>I found that the only functions in PHP were for creating PDF files.
>>>
>>> The potential customer receives order forms from the corporate
>>> headquarters and they are PDF forms. What we want to do is to
>>> extract information from these forms and process the data into a
>>> database. To do this we need to read certain set fields. Nowhere
>>> did I find a function to be able to read PDF files, let alone extract
>>> information from them.
>>>
>>> My thoughts, in the absence of this function, would be if there were
>>> a way to open the file, strip the formatting, and then work on the
>>> text stream. The key unknown for me in this is how to strip the
>>> formatting.
>>>
>>> So, do I hear any suggestions for either?:
>>> (1) How to read predetermined field entries from a PDF file or
>>> (2) How to convert a PDF into an unformatted text stream
>>>
>>> Shelly
>>>
>
> yikes, found this expensive option via the folks at pdflib:
>
> http://www.pdflib.com/products/tet/
yikes is an understatement
>
>
> ... also found a link that suggests PDF files are just gzipped XML, so
> maybe you could write your own extractor:
>
> http://www.thescripts.com/forum/thread631837.html
hmm.
Navigation:
[Reply to this message]
|