|
Posted by Good Man on 09/19/07 19:46
"Shelly" <sheldonlg.news@asap-consult.com> wrote in
news:13f2ro925ga7teb@corp.supernews.com:
> Any suggestions?
>
> "Shelly" <sheldonlg.news@asap-consult.com> wrote in message
> news:13f2f8uqm3eck19@corp.supernews.com...
>>I had to do my first investigation regarding PDF files. Surprisingly,
>>I found that the only functions in PHP were for creating PDF files.
>>
>> The potential customer receives order forms from the corporate
>> headquarters and they are PDF forms. What we want to do is to
>> extract information from these forms and process the data into a
>> database. To do this we need to read certain set fields. Nowhere
>> did I find a function to be able to read PDF files, let alone extract
>> information from them.
>>
>> My thoughts, in the absence of this function, would be if there were
>> a way to open the file, strip the formatting, and then work on the
>> text stream. The key unknown for me in this is how to strip the
>> formatting.
>>
>> So, do I hear any suggestions for either?:
>> (1) How to read predetermined field entries from a PDF file or
>> (2) How to convert a PDF into an unformatted text stream
>>
>> Shelly
>>
yikes, found this expensive option via the folks at pdflib:
http://www.pdflib.com/products/tet/
.... also found a link that suggests PDF files are just gzipped XML, so
maybe you could write your own extractor:
http://www.thescripts.com/forum/thread631837.html
[Back to original message]
|