|
Posted by C. on 09/23/07 02:59
On 22 Sep, 23:39, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> Shelly wrote:
> > Here is a situation that I have to think out for a potential customer.
> > Currently he receives about 150 emails a day with pdf attachments for
> > orders. The format of the pdfs are all the same. Now he has to:
>
>
> Why is it coming in in a PDF? I would think this would be the place to
> make the change - into something that's easily machine readable.
>
> I suppose you could extract info from a pdf - I've never tried it, but
> don't see why it wouldn't be possible. But it will be much harder.
>
One good reason is that PDF's are written in Postscript - which is a
programming language rather a data structure. Unless all the PDFs come
out of the same bit of software, there's no guarantee that what
appears at a particular place on screen will always be in the same
place in the code. Even if it is in the same place, it might be
encoded directly as a glyph or a font table reference rather than as
recognizable characters.
I'd suggest that the OP look to see if the data can be captured in a
machine readable form (even if that is embeded within a human readable
format) and if not - walk away.
C.
Navigation:
[Reply to this message]
|