On 31-Aug-2016 13:48 -0600, John Yeung wrote:
On Wed, 31-Aug-2016 at 14:12, (WalzCraft) Jerry Forss wrote:
On 31-Aug-2016 12:54 -0600, MichaelQuigley wrote:
On Wed 31 Aug 2016 10:41:44 -0400 John Yeung wrote:
Why can't he use the "big character string" approach with
POSSTR or LOCATE as Glenn suggested?
(And please note that I'm not asking why it's not the *best*
way; I'm merely asking why it's not *a* way.)
It could be "a" way, but you would have to be careful how your
"big character string" is formatted. It might not work because
POSSTR and LOCATE will find the position of any occurrence of a
string, not necessarily the key value for which you're looking.
e.g. If you're searching the following string of three character
values
'ABCDEFGHI' i.e., values of 'ABC','DEF', and 'GHI'
If you test for 'CDE', POSSTR and/or LOCATE will say it's there
and it is--in position 3. But it isn't really one of the values
you intended. Now if your "big character string" is delimited
(using any delimiter -- it doesn't really matter what) it would
work. e.g. 'ABC,DEF|GHI'.
This would be an issue for what I am doing.
Some of the values will be MLMD, MLDM and MLD.
So scanning for a value of MLD may give a false positive.
A set of sufficiently dissimilar fixed-width elements could still be
processed without problems; for which an exhaustive test could be
performed against a given set. With the very simple example shown, the
issue of a false-positive is easily overcome by coding the selection to
the actual size; i.e. as 4-byte elements, a 3-byte value would be
searched as the blank-padded value. So if in the above example the
elements are four-characters each, then when the element 'MLD␢' ["␢"
represents the blank-pad] is desired, then actually code the scalar to
locate that value *with the blank included\mandatory*; e.g. as coded
here, using the CHAR casting scalar, to avoid any issues with visibility
of a blank:
stringAsArrayOf4byteElems='MLMDMLDMMLD MORE'
; -- align under 2nd arg/* ...4...8..12..16 */
exec sql set intVariable =
locate( CHAR('MLD', 4), :stringAsArrayOf4byteElems)
; -- result: intVariable=9 /* The desired effect */
exec sql set intVariable =
locate( 'MLD' ), :stringAsArrayOf4byteElems)
; -- result: intVariable=5 /* Undesirable effect */
It wouldn't, if you took the appropriate precautions for the format
of the data.
The example you started with was minimal, so I couldn't tell if it
was space-delimited or fixed-width.
If fixed-width, then you would want to divide the result of your
scan by the element size, and you only have a match if the remainder
is 1.
While that logic could be programmed, that is not something that can
be done so directly\easily within the query itself; i.e. neither LOCATE
nor POSSTR have a related LOCATE_ON_BOUNDARY or POSSTR_ON_BOUDARY scalar
or an additional argument to specify a restriction on what boundary a
searched token must be aligned. Not even the newer LOCATE_IN_STRING
[aka InStr] offers that capability; instead, expecting\supporting the
user knowing which occurrence is sought, but that would allow coding
multiple requests within a CASE expression to accommodate elimination of
that number of potential false-positives.
What I am suggesting, is, that the following predicate would
incorrectly conclude that the searched value is _missing_ due to the
searched value being located off-bounds\alignment at position-4, when in
fact, the 4-byte value *is present* at the properly aligned position-9;
lacking an ability to code [loop] logic in the predicate or scalar, to
keep looking for the correct occurrence, although such work could be
offloaded to a User Defined Function (UDF) [table or scalar]:
WHERE MOD(LOCATE('MLDX', 'MLDMLDXDMLDX'), 4) = 1
-- align under 2nd arg /* ...4...8..12 */
If delimited (by anything, including spaces), then you would first of
all want to ensure you have a delimiter at the beginning of every
field (so one extra one at the beginning) or at the end (so one extra
one at the end). Then include the delimiter in your scan.
So, scanning is still doable, but how convenient it is depends on how
the data in the big string is formatted.
Easily the better choice, for enabling the use of the LOCATE or
POSSTR [etc.], and also for use with a LIKE predicate. But if the data
is not already stored delimited, and is already stored as contiguous
array-like elements, then the ease with which that data can be
generated\rewritten into a true-temporary table makes that an easy way
to locate, whilst allowing for the IN predicate from the original query
to remain, but with that temporary table referenced in a subquery [see
my next reply to this topic thread].