I was wondering why I didn't see the REGEXP_* functions here, and the reason is that we don't have the International Components for Unicode APIs loaded. So now it makes sense why x'41' matches 'A', x'41' is Unicode for 'A' and REGEXP_LIKE operates on Unicode values. Using REGEXP_* functions, which appear to assume Field1 is Unicode instead of EBCDIC, could have other unintended consequences since some Unicode code points are 2+ bytes wide. Or maybe your CCSID is 65535. And yet that leaves me with even more questions since the documentation says both the string to be searched and the pattern are converted to UTF-16, and FOR BIT DATA strings are not allowed. It seems that the REGEXP_* functions could be extremely confusing if the data is invalid, and therefore not really a good way to go for correcting that data.
When you look at Field1, does it have an 'A' in there instead of a blank? Or is it really an x'41'. Maybe the conversion from your job CCSID to UTF-16 converts x'41' to x'0041'. Maybe it assumes that a hex constant is already UTF-16 (even if it is only a single byte wide).
Mark Murphy
STAR BASE Consulting, Inc.
mmurphy@xxxxxxxxxxxxxxx
-----mprice@xxxxxxxxx wrote: -----
To: Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
From: mprice@xxxxxxxxx
Date: 02/15/2016 11:23AM
Subject: Re: SQL and Regular expression
When we import data from an external source, we sometimes get a 'bad'
character.
In this particular case I was looking for X'41' ( EBCDIC ) that gets sent
for a ' ' (X'40').
In other words REGEXP_LIKE(Field1,'\x41') was matching 'A'.
Michael
John Yeung <gallium.arsenide@xxxxxxxxx>
Sent by: "MIDRANGE-L" <midrange-l-bounces@xxxxxxxxxxxx>
02/15/2016 10:30 AM
Please respond to
Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
To
Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
cc
Subject
Re: SQL and Regular expression
On Mon, Feb 15, 2016 at 8:34 AM, <mprice@xxxxxxxxx> wrote:
I was running an SQL statement on a field in a physical file using
REGEXP_LIKE(Field1,'\xC1') expecting to find 'A' .
After this failed to return the desired results, I discovered that I
have
to use the ASCII equivalent.
ie REGEXP_LIKE(Field1,'\x41')
Expected ?
or
Strange ?
To me, SQL is behaving in a way that is expected (or at least not
particularly strange), but what you are trying to do is very strange.
Why are you searching for hex codes?
John Y.
As an Amazon Associate we earn from qualifying purchases.