Thanks Daniel
   Cheers
   Don
    
   Don Brown
   Senior Consultant
    
   [1]OneTeam IT Pty Ltd
   P: 1300 088 400
   -----Original Message-----
   From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
   Daniel Gross
   Sent: Monday, 23 June 2025 2:51 PM
   To: midrange-l@xxxxxxxxxxxxxxxxxx
   Subject: Re: Convert PDF to text
   Hi Don,
   no - both extract the same text parts, as both use the same data
   structures inside the PDF.
   The only difference is, when working with PDFbox procedures directly from
   RPG (using *JAVA objects and prototypes) you have complete control over
   the whole process.
   If anything fails you have a Java exception, that you can handle, and
   recover from. This is much easier, than checking a sub-process for success
   or failure.
   And you have complete control over the JVM start - which means, that you
   can control when the JVM is loaded - in my case, it's a background job,
   which is waiting for work - and it's loading the JVM right from the start,
   so it doesn't have any delay when doing its work.
   When using PDFbox as a command line utility, there is practically no
   difference.
   I think it will be in July with the blog post - but I will post at
   LinkedIn, and hopefully I remember to send you a mail, when it's out.
   Regards,
   Daniel
   > Am 23.06.2025 um 01:11 schrieb Don Brown via MIDRANGE-L
   <midrange-l@xxxxxxxxxxxxxxxxxx>:
   >
   >  Hi Daniel,
   >
   > While I have this working I would like to see you blog article.
   >
   > With PDFbox I did have to include the -sort switch to get the text
   > back in a more meaningful order.
   >
   > Have you noticed any difference in Ghostscript to PDFbox (or any
   > other
   > tools) with the accuracy/completeness of the text returned ?
   >
   > Thanks
   > Don
   >
   >
   >
   > Don Brown
   >
   > Senior Consultant
   >
   > [1]OneTeam IT Pty Ltd
   > P: 1300 088 400
   >
   > -----Original Message-----
   > From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
   > Daniel Gross
   > Sent: Monday, 16 June 2025 7:00 PM
   > To: midrange-l@xxxxxxxxxxxxxxxxxx
   > Subject: Re: Convert PDF to text
   >
   > Hi Don,
   >
   > it depends.
   >
   > From the "command line" you can use Ghostscript. The latest PASE
   > version from the IBM repository should be OK, with
   >
   > gs
   > -DEVICE=txtwrite
   > -o output.txt
   > input.pdf
   >
   > you should get an output - but you maybe have to experiment with the
   > encoding, as this is not fixed in PDF documents.
   >
   > From RPG I would always use PDFbox ([2][2]
https://pdfbox.apache.org/) -
   > with this, you have complete control over the PDF processing.
   >
   > But you can also use PDFbox from the command line using
   >
   > java
   > -jar pdfbox-app-3.y.z.jar
   > export:text
   > [OPTIONS]
   > -i=<infile>
   >
   > But make sure, to use a reasonable new 64-bit JVM - I'm using Java 21
   > 64-bit, and it's quite fast - in fact after the initial JVM loading,
   > Java is near native performance.
   >
   > I had the task to split PDF files - up to 5 or 6 pages, Ghostscript
   > (PASE) was faster - but with 10 or more pages, PDFbox (Java 21
   > 65-bit) was always faster. And it got even better, if more than one
   > file was to split in the same Job/Session - PDFbox was always faster,
   > as the JVM stayed in memory and even the JAR file was kept loaded.
   >
   > So as I said - it really depends on what you want to do exactly - and
   how.
   > I.e. if this text should go into a database table, I would recommend
   > going the RPG/Java/PDFbox way.
   >
   > I'm in the process to write a bit about RPG, Java and PDFbox in the
   > nexts weeks on my blog. If you like I can give you sneak peek of it.
   > It's a bit overwhelming at the beginning with JNI, JVM initialization
   > and RPG to Java prototypes - but once you got it, pack everting you
   > need into a service program, and be happy.
   >
   > HTH and kind regards,
   > Daniel
   >
   >> Am 16.06.2025 um 10:37 schrieb Patrik Schindler <poc@xxxxxxxxxx>:
   >> Hello Don,
   >>> Am 16.06.2025 um 09:12 schrieb Don Brown via MIDRANGE-L
   >> <midrange-l@xxxxxxxxxxxxxxxxxx>:
   >>> 1. Does anyone have a recommended solution to achieve converting a
   >>> pdf
   > to text. I am after a php or native rpg ish solution. Not python please.
   >> I'd use the pdftotext command from the poppler-utils package in PASE.
   >> I
   > assume the poppler-utils package is available for installation via yum.
   >> [3][3]
https://en.wikipedia.org/wiki/Poppler_(software)
   >> :wq! PoC
   >> --
   >> This is the Midrange Systems Technical Discussion (MIDRANGE-L)
   >> mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
   >> To subscribe, unsubscribe, or change list options,
   >> visit: [4][4]
https://lists.midrange.com/mailman/listinfo/midrange-l
   >> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
   >> Before posting, please take a moment to review the archives at
   >> [5][5]
https://archive.midrange.com/midrange-l.
   >> Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
   >> related
   > questions.
   > --
   > This is the Midrange Systems Technical Discussion (MIDRANGE-L)
   > mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
   > To subscribe, unsubscribe, or change list options,
   > visit: [6][6]
https://lists.midrange.com/mailman/listinfo/midrange-l
   > or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
   > Before posting, please take a moment to review the archives at
   > [7][7]
https://archive.midrange.com/midrange-l.
   >
   > Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
   > related questions.
   >
   > --
   > Message protected by MailGuard: e-mail anti-virus, anti-spam and
   > content filtering.
   > [8][8]
https://www.mailguard.com.au
   >
   > References
   >
   > Visible links
   > 1. [9]
https://www.oneteamit.com.au/
   > 2. [10]
https://pdfbox.apache.org/)
   > 3. [11]
https://en.wikipedia.org/wiki/Poppler_(software)
   > 4. [12]
https://lists.midrange.com/mailman/listinfo/midrange-l
   > 5. [13]
https://archive.midrange.com/midrange-l.
   > 6. [14]
https://lists.midrange.com/mailman/listinfo/midrange-l
   > 7. [15]
https://archive.midrange.com/midrange-l.
   > 8. [16]
https://www.mailguard.com.au/
   > --
   > This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
   > list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To
   > subscribe, unsubscribe, or change list options,
   > visit: [17]
https://lists.midrange.com/mailman/listinfo/midrange-l
   > or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
   > Before posting, please take a moment to review the archives at
   > [18]
https://archive.midrange.com/midrange-l.
   >
   > Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
   questions.
   --
   This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
   list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe,
   unsubscribe, or change list options,
   visit: [19]
https://lists.midrange.com/mailman/listinfo/midrange-l
   or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
   Before posting, please take a moment to review the archives at
   [20]
https://archive.midrange.com/midrange-l.
   Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
   questions.
   --
   Message protected by MailGuard: e-mail anti-virus, anti-spam and content
   filtering.
   [21]
https://www.mailguard.com.au
References
   Visible links
   1. 
https://www.oneteamit.com.au/
   2. 
https://pdfbox.apache.org/)
   3. 
https://en.wikipedia.org/wiki/Poppler_(software)
   4. 
https://lists.midrange.com/mailman/listinfo/midrange-l
   5. 
https://archive.midrange.com/midrange-l.
   6. 
https://lists.midrange.com/mailman/listinfo/midrange-l
   7. 
https://archive.midrange.com/midrange-l.
   8. 
https://www.mailguard.com.au/
   9. 
https://www.oneteamit.com.au/
  10. 
https://pdfbox.apache.org/)
  11. 
https://en.wikipedia.org/wiki/Poppler_(software)
  12. 
https://lists.midrange.com/mailman/listinfo/midrange-l
  13. 
https://archive.midrange.com/midrange-l.
  14. 
https://lists.midrange.com/mailman/listinfo/midrange-l
  15. 
https://archive.midrange.com/midrange-l.
  16. 
https://www.mailguard.com.au/
  17. 
https://lists.midrange.com/mailman/listinfo/midrange-l
  18. 
https://archive.midrange.com/midrange-l.
  19. 
https://lists.midrange.com/mailman/listinfo/midrange-l
  20. 
https://archive.midrange.com/midrange-l.
  21. 
https://www.mailguard.com.au/
As an Amazon Associate we earn from qualifying purchases.