|
Posted by Jerry Stuckle on 09/23/07 12:17
Shelly wrote:
> "C." <colin.mckinnon@gmail.com> wrote in message
> news:1190516388.977848.29540@y42g2000hsy.googlegroups.com...
>> On 22 Sep, 23:39, Jerry Stuckle <jstuck...@attglobal.net> wrote:
>>> Shelly wrote:
>>>> Here is a situation that I have to think out for a potential customer.
>>>> Currently he receives about 150 emails a day with pdf attachments for
>>>> orders. The format of the pdfs are all the same. Now he has to:
>>>
>>> Why is it coming in in a PDF? I would think this would be the place to
>>> make the change - into something that's easily machine readable.
>>>
>>> I suppose you could extract info from a pdf - I've never tried it, but
>>> don't see why it wouldn't be possible. But it will be much harder.
>>>
>> One good reason is that PDF's are written in Postscript - which is a
>> programming language rather a data structure. Unless all the PDFs come
>> out of the same bit of software, there's no guarantee that what
>> appears at a particular place on screen will always be in the same
>> place in the code. Even if it is in the same place, it might be
>> encoded directly as a glyph or a font table reference rather than as
>> recognizable characters.
>>
>> I'd suggest that the OP look to see if the data can be captured in a
>> machine readable form (even if that is embeded within a human readable
>> format) and if not - walk away.
>>
>> C.
>>
>
>
> The pdf are ALL the same. They come from the corporate website and the
> customer has NO control over that it is a pdf. The customer does not
> currently have a website of his own. I could also do this job as a
> standalone application on his computer, but was wondering if he could do
> double duty by having his own website.
>
Has he tried talking to the people at corporate? I would think it would
be in their best interest to provide the information in a machine-readable
form, also.
I've done some looking around but haven't found anything to read PDF's. One
thing to be concerned about also - is text stored as text, or as an image? I've
seen both. Text wouldn't be too hard, but an image would require ocr software.
> Shelly
>
> P.S. It's nice to see we're still talking, Jerry.
>
>
Oh, shit. I thought this was a different Shelly :-)
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Navigation:
[Reply to this message]
|