RE: Character set problem with data being sent through SQL(?) -- MIDRANGE-L

Without knowing exactly how the data is transferred and received it is like looking into a magic ball ... we only can guess.

It depends how the data is sent (which CCSID) and how the data is interpreted. If the Job or connection CCSID is 65535 nothing will be converted. Otherwise, it is converted into the JOB CCSID.
If the incoming data has UTF-8 CCSID it is converted correctly ... but there may be some characters that are not in the JOB's CCSID, and these are converted into substitutions characters.

... but I assume the incoming data has an ASCII (and not UTF8) CCSID, so the conversion occurs from ASCII and not UTF-8

Mit freundlichen Grüßen / Best regards

Birgitta Hauser
Modernization – Education – Consulting on IBM i
Database and Software Architect
IBM Champion since 2020

"Shoot for the moon, even if you miss, you'll land among the stars." (Les Brown)
"If you think education is expensive, try ignorance." (Derek Bok)
"What is worse than training your staff and losing them? Not training them and keeping them!"
"Train people well enough so they can leave, treat them well enough so they don't want to. " (Richard Branson)
"Learning is experience … everything else is only information!" (Albert Einstein)

-----Original Message-----
From: MIDRANGE-L <midrange-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of John Yeung
Sent: Friday, 23 May 2025 10:07
To: Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxxxxxxxx>
Subject: Re: Character set problem with data being sent through SQL(?)

On Fri, May 23, 2025 at 11:56 AM James H. H. Lampert via MIDRANGE-L <midrange-l@xxxxxxxxxxxxxxxxxx> wrote:

But it's acting as if their connection THINKS they're sending 8859
Latin 1. With the result that "lowercase e with acute accent," which
is hex
"C3 A9" in UTF-8, ends up in the file as hex "23 62 B4," which is
<highlight reverse>, "Capital A with tilde," "copyright symbol."

OK. First of all, x'C3A9' is indeed small-e-with-acute in UTF-8. So those are the two raw bytes they would see on their end, looking at their own data.

Those two bytes, when interpreted as Latin-1, are capital-A-with-tilde and copyright-sign. Just two characters. So presumably, if their connection thinks their source data is Latin-1, then it would have just sent over those two characters.

Now, I'm guessing you've got a typo, and it should be x'2366B4' that are the bytes you are seeing on your end. (x'62' is capital-A-with-circumflex in CCSID 37).

I'm a bit lost on how the x'23' got there.

Can you get any more specifics on what they're doing on their end to set up the connection? Most if not all of the issue, given your description, seems to be on their end.

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at https://archive.midrange.com/midrange-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related questions.