|
On Tue, Sep 11, 2018 at 11:15 AM (WalzCraft) Jerry Forss
<JForss@xxxxxxxxxxxxx> wrote:
well.
Working on that approach [receiving data in something other than PDF] as
for me.
Working with PDF's as the starting point vs the result is a new thing
I'm with Brad: Virtually anything you can get other than PDF is going
to be better than PDF.
You *shouldn't* have experience with PDFs as the "starting point"
(i.e. data source) because it's a horrible, horrible format for that
purpose. It's designed entirely for producing output.
How easy and how accurate it is to extract data from a PDF depends a
lot on how the PDF was made. This is regardless of programming
language or operating system or third-party tools or whatever. The
worst-case scenario is that the PDF is little more than a graphic
image of an optically scanned (or even photographed!) document. Then
the parsing will involve OCR, if possible at all.
If the PDF was made programmatically, with actual character data
rather than image data, then you have a fighting chance; but by the
same token, whatever programmatic process created that PDF should also
be able to create something else, ANYTHING else, from whatever it was
using as input.
John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxx for any subscription related
questions.
Help support midrange.com by shopping at amazon.com with our affiliate
link: http://amzn.to/2dEadiD
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.