I did, Scott - thx for clarifying, I was working totally from distant memory. We definitely had issues with emojis - would have to go back to what and when certain characters caused errors when using XML-SAX, IIRC.

Cheers
Vern

On 2/25/2021 5:59 PM, Scott Klement wrote:
Vern,

I suspect you meant "UCS-2" rather than "UTC2", correct?

Your experience doesn't match mine.  I had no trouble with UCS-2 working with non-EBCDIC characters, going much further back than 7.1.  Though, perhaps it was the specific character you were dealing with.  Hard to say...

-SK


On 2/24/2021 8:41 PM, Vern Hamberg wrote:
Hi Scott

This poster said they are at 7.1 - so were we in 2017 when we were having trouble with this. I don't think UTF-8 was available in RPG in that OS, so we were maybe stuck with UTC2 - and XML-INTO got errors when it ran into things that it couldn't convert. Some stuff, like an ellipsis, now worked - but emojis like a mouse face - no way - BOOM!

I should say what I forgot - we were actually using XM-SAX, because XML-INTO was hopeless - and since the position for the error that corresponds to one of the status codes - 351 or 2 or ... - that position wasn't by character, it was by byte - so it multiple misunderstood "characters" just got more and more out of alignment.

Now maybe this is all better in later releases - we ended up using the XMLTABLE SQL function on the content of an XML file that was pulled in with one of the other functions.

SQL gracefully handled unconvertable characters, so that became our go-to. For the most part, although field associates could enter ellipses or mouse faces, mostly the loss of those was unimportant - of course, a pest elimination service might like to use a mouse face, right?  :)

Hope this is making sense - things HAVE changed since 7.1, and for the better.

Cheers
Vern

On 2/24/2021 6:16 PM, Scott Klement wrote:
Consider reading the data into Unicode fields instead of EBCDIC fields. This will allow the existing characters (assuming they really are characters and are properly encoded -- which is probable) to be read properly into RPG fields.

Once in RPG, you can convert the ones you don't want to accept into something else.  For example, replace curly quotes with straight quotes, etc.  Then when you do convert them to EBCDIC, they'll be in the best format they can be in.

I don't think it's a good idea to use something like 'tr' or SQL to change the characters while the document is still in raw XML format because you run the risk of messing up the document format or character escaping.  Wait until the document has been interpreted.  The longer you can keep the document in pure Unicode format before having to convert it to the much more limited character set of EBCDIC, the better.

On 2/24/2021 2:09 PM, Carel wrote:
Runnin on V7.1

We receive from a SAP environment xml documents which are stored as a CLOB in a PF on our system.

To process those xml documents we extract those documetns from that file, store it on the IFS and then read the IFS file in with XML-INTO.

Some of those xml documents crash on the XML-INTO opcode. It appears those xml documents contain non-printable charaters (less than x'40')

So, we have to remove those non-printable charaters.

I looked at QSH -tr command, but could not gget it to work.

Another option is using SQL when retrieving the CLOB from the PF before writing to the IFS.
Itried the following syntax:

exec sql
select REPLACE(CLOB_in_PF, x'0102 ... 3E3F', '') into :SQLCLOB_field from Our_PF ;  (thus all hex values with the exception of NULL).

This doesnot work.

Question:
How can we remove those non-printable characters from an xml document stored in our PF?

TIA

Kind regards,
Carel Teijgeler







As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.