Because a "Unicode string literal" is defined as having a UTF-16 CCSID
(1200) and UX'00C2' is the UTF-16 encoding of U+00C2, which is encoded as
X'C382' in UTF-8.
0xC2 is not a "regular ASCII character", ASCII only supports 0x00-0x7F.
Anything higher will be two or more bytes in UTF-8.
----- Original message -----
From: Steve Richter <stephenrichter@xxxxxxxxx>
Sent by: "MIDRANGE-L" <midrange-l-bounces@xxxxxxxxxxxxxxxxxx>
To: Midrange Systems Technical Discussion
<midrange-l@xxxxxxxxxxxxxxxxxx>
Cc:
Subject: Re: how to insert hex data into UTF-8 column?
Date: Sun, Mar 10, 2019 5:50 PM
very cool. did not see that part.
what I do not follow now is why does UX'00C2' get stored in UTF-8 as
x'C382' ? I guess because of the variable number of bytes nature of
UTF-8. All the regular ASCII characters are represented with a single
byte
in UTF-8.
been having to spend a lot of time tracking down problems when my web
apps
push bad data into the DB2 database. Somehow an invalid UTF-8 character
is
being stored in the CCSID 1208 column. Once that happens, the PHP
json_encode function returns an empty string. What I am planning to do
is
write an SQL function that scans the UTF-8 column for bad characters and
returns the byte location of that character. Then use SUBSTRING with the
OCTETS option to remove it.
[1]
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_bif_substring.html
On Sun, Mar 10, 2019 at 6:10 PM Jon Paris <jon.paris@xxxxxxxxxxxxxx>
wrote:
> I'm not sure what solution you drew from that thread Steve - but it
seems
> to me that the only thing needed is to add U to the string to indicate
that
> it is a Unicode hex string and not a character hex string. So this:
>
> insert into repcat52 ( catnum, catname, desc, desc_utf)
> values( 1, 'WAVE', x'818283F1F2F3', Ux'003100320033');
>
> Works just fine.
>
>
> Jon Paris
>
> www.partner400.com
> www.SystemiDeveloper.com
>
> > On Mar 10, 2019, at 6:02 PM, Steve Richter
<stephenrichter@xxxxxxxxx>
> wrote:
> >
> > this post has a lot of good info.
> >
>
[2]
https://www.ibm.com/developerworks/community/forums/html/topic?id=4fddf9b9-b8d3-4ba3-9824-476d6a2efa48
> >
> > do not follow why DB2 would want to translate hex data. The point
of
> hex
> > is to bypass any character translation.
> >
> >
> > On Sun, Mar 10, 2019 at 1:29 PM Steve Richter
<stephenrichter@xxxxxxxxx>
> > wrote:
> >
> >>
> >>
> >> On Sun, Mar 10, 2019 at 12:43 PM Jon Paris
<jon.paris@xxxxxxxxxxxxxx>
> >> wrote:
> >>>
> >>> I'm guessing that by default it is expecting the hex values to be
> >> EBCDIC. This works:
> >>>
> >>> insert into repcat52 ( catnum, catname, desc, desc_utf)
> >>> values( 1, 'WAVE', x'818283F1F2F3',
> >>> x'F1F2F3');
> >>>
> >>
> >> I am looking to insert some multibyte UTF-8 characters. like
x'C383'.
> >>
> >> [3]
https://www.fileformat.info/info/charset/UTF-8/encode.htm
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> > --
> > This is the Midrange Systems Technical Discussion (MIDRANGE-L)
mailing
> list
> > To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
> > To subscribe, unsubscribe, or change list options,
> > visit: [4]
https://lists.midrange.com/mailman/listinfo/midrange-l
> > or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
> > Before posting, please take a moment to review the archives
> > at [5]
https://archive.midrange.com/midrange-l.
> >
> > Please contact support@xxxxxxxxxxxx for any subscription related
> questions.
> >
> > Help support midrange.com by shopping at amazon.com with our
affiliate
> link: [6]
https://amazon.midrange.com
>
> --
> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list
> To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
> To subscribe, unsubscribe, or change list options,
> visit: [7]
https://lists.midrange.com/mailman/listinfo/midrange-l
> or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
> Before posting, please take a moment to review the archives
> at [8]
https://archive.midrange.com/midrange-l.
>
> Please contact support@xxxxxxxxxxxx for any subscription related
> questions.
>
> Help support midrange.com by shopping at amazon.com with our affiliate
> link: [9]
https://amazon.midrange.com
>
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list
To post a message email: MIDRANGE-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: [10]
https://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at [11]
https://archive.midrange.com/midrange-l.
Please contact support@xxxxxxxxxxxx for any subscription related
questions.
Help support midrange.com by shopping at amazon.com with our affiliate
link: [12]
https://amazon.midrange.com
References
Visible links
1.
https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_bif_substring.html
2.
https://www.ibm.com/developerworks/community/forums/html/topic?id=4fddf9b9-b8d3-4ba3-9824-476d6a2efa48
3.
https://www.fileformat.info/info/charset/UTF-8/encode.htm
4.
https://lists.midrange.com/mailman/listinfo/midrange-l
5.
https://archive.midrange.com/midrange-l
6.
https://amazon.midrange.com/
7.
https://lists.midrange.com/mailman/listinfo/midrange-l
8.
https://archive.midrange.com/midrange-l
9.
https://amazon.midrange.com/
10.
https://lists.midrange.com/mailman/listinfo/midrange-l
11.
https://archive.midrange.com/midrange-l
12.
https://amazon.midrange.com/
As an Amazon Associate we earn from qualifying purchases.