Hi Daniel,
   While I have this working I would like to see you blog article.
   With PDFbox I did have to include the -sort switch to get the text back in
   a more meaningful order.
   Have you noticed any difference in Ghostscript to PDFbox (or any other
   tools) with the accuracy/completeness of the text returned ?
   Thanks
   Don
    
   Don Brown
   Senior Consultant
    
   [1]OneTeam IT Pty Ltd
   P: 1300 088 400
   -----Original Message-----
   From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
   Daniel Gross
   Sent: Monday, 16 June 2025 7:00 PM
   To: midrange-l@xxxxxxxxxxxxxxxxxx
   Subject: Re: Convert PDF to text
   Hi Don,
   it depends.
   From the "command line" you can use Ghostscript. The latest PASE version
   from the IBM repository should be OK, with
   gs
   -DEVICE=txtwrite
   -o output.txt
   input.pdf
   you should get an output - but you maybe have to experiment with the
   encoding, as this is not fixed in PDF documents.
   From RPG I would always use PDFbox ([2]
https://pdfbox.apache.org/) - with
   this, you have complete control over the PDF processing.
   But you can also use PDFbox from the command line using
   java
   -jar pdfbox-app-3.y.z.jar
   export:text
   [OPTIONS]
   -i=<infile>
   But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
   64-bit, and it's quite fast - in fact after the initial JVM loading, Java
   is near native performance.
   I had the task to split PDF files - up to 5 or 6 pages, Ghostscript (PASE)
   was faster - but with 10 or more pages, PDFbox (Java 21 65-bit) was always
   faster. And it got even better, if more than one file was to split in the
   same Job/Session - PDFbox was always faster, as the JVM stayed in memory
   and even the JAR file was kept loaded.
   So as I said - it really depends on what you want to do exactly - and how.
   I.e. if this text should go into a database table, I would recommend going
   the RPG/Java/PDFbox way.
   I'm in the process to write a bit about RPG, Java and PDFbox in the nexts
   weeks on my blog. If you like I can give you sneak peek of it. It's a bit
   overwhelming at the beginning with JNI, JVM initialization and RPG to Java
   prototypes - but once you got it, pack everting you need into a service
   program, and be happy.
   HTH and kind regards,
   Daniel
   > Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
   > Hello Don,
   >
   > Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L
   <midrange-l@xxxxxxxxxxxxxxxxxx>:
   >
   >> 1. Does anyone have a recommended solution to achieve converting a pdf
   to text. I am after a php or native rpg ish solution. Not python please.
   >
   > I'd use the pdftotext command from the poppler-utils package in PASE. I
   assume the poppler-utils package is available for installation via yum.
   >
   > [3]
https://en.wikipedia.org/wiki/Poppler_(software)
   >
   > :wq! PoC
   >
   > --
   > This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
   > list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
   > subscribe, unsubscribe, or change list options,
   > visit: [4]
https://lists.midrange.com/mailman/listinfo/midrange-l
   > or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
   > Before posting, please take a moment to review the archives at
   > [5]
https://archive.midrange.com/midrange-l.
   >
   > Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
   questions.
   --
   This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
   list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
   unsubscribe, or change list options,
   visit: [6]
https://lists.midrange.com/mailman/listinfo/midrange-l
   or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
   Before posting, please take a moment to review the archives at
   [7]
https://archive.midrange.com/midrange-l.
   Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
   questions.
   --
   Message protected by MailGuard: e-mail anti-virus, anti-spam and content
   filtering.
   [8]
https://www.mailguard.com.au
References
   Visible links
   1. 
https://www.oneteamit.com.au/
   2. 
https://pdfbox.apache.org/)
   3. 
https://en.wikipedia.org/wiki/Poppler_(software)
   4. 
https://lists.midrange.com/mailman/listinfo/midrange-l
   5. 
https://archive.midrange.com/midrange-l.
   6. 
https://lists.midrange.com/mailman/listinfo/midrange-l
   7. 
https://archive.midrange.com/midrange-l.
   8. 
https://www.mailguard.com.au/
As an Amazon Associate we earn from qualifying purchases.