On 20-Oct-2015 12:59 -0500, Vinay Gavankar wrote:
I have a requirement where I need to check for number times a value
occurs in a field in the file.
The file has a field K1 (non-unique) and another field F1
For the records with same value of K1, F1 can have different values.
I need to consider only the value of last record (highest RRN). If my
file has 3 records for K1=A (in RRN sequence)
A XXX
A YYY
A ZZZ
then I want to consider only ZZZ for processing.
drop table kf
;
create table kf (k1 char, f1 char(3) not null default '')
;
insert into kf values
('A', 'XXX'), ('A', 'YYY'), ('A', 'ZZZ')
;
The following query returns the above "only ZZZ" result set for the
'A' values shown above, and the result is of course 'ZZZ':
select P.f1
from kf as P
where P.k1='A'
and RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1
)
I need to get a count of unique, non-blank values of F1 in the file
(considering only the last record for K1).
To give an example, if my full file looks like this:
Asterisks delimiting row-data indicates that row has a value of F1
for which the maximum RRN() is found within the K1 grouping.
A XXX
A YYY
*A ZZZ*
*B XXX*
C MMM
*C ZZZ*
*D XXX*
*E UUU*
*F ZZZ*
G (blanks)
H (blanks)
then my counts should be:
ZZZ 3
XXX 2
UUU 1
drop table kf
;
create table kf (k1 char, f1 char(3) not null default '')
;
insert into kf values
('A', 'XXX'), ('A', 'YYY'), ('A', 'ZZZ')
, ('B', 'XXX'), ('C', 'MMM'), ('C', 'ZZZ')
, ('D', 'XXX'), ('E', 'UUU'), ('F', 'ZZZ')
, ('G', ' '), ('H', ' ')
;
The above described "counts" result-set can be obtained with the
following query:
select P.f1, count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
If the highest count is greater than a certain number (threshold), I
have to take a particular action.
create global variable threshold bigint default 2
;
The following query returns a result-set identifying 'ZZZ' as having
three occurrences because that is the only value for which the count
exceeded the threshold, i.e. 2, as declared above.
select P.f1, count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
having count(*) > threshold
fetch first row only
So basically, I just need the highest count. The value of F1 for
which highest count occurred is not important.
The following obtains as a result-set just the count, but ignores the
requirement for a threshold; the threshold of course, can be compared
against the fetched result.
select count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
order by tot desc
fetch first row only
Or if the actual count is only relevant given the threshold is
exceeded, then the following could be used to return an empty result-set
when the threshold is not exceeded.
select A.tot
from (
select count(*) as tot
from kf as P
where RRN(P) = ( select MAX(RRN(S))
from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
order by tot desc
fetch first row only
) as A
where A.tot > threshold
Is this possible to do with sql (embedded in RPGLE).
Sure. See above [tested on v5r3] for what might function as desired.
Or what would be the most efficient (I/O and CPU wise) way of doing
this?
The following keyed Access Path from the derived INDEX might be
useful if implementing via SQL similar to what I offered above; that is
not available until v6r1:
create index kfx on kf (k1, rrn(kf) desc)
The file actually has 10 fields (F1 thru f10) and the above needs to
be done for all the 10 fields (could be a different threshold value
for all fields).
Not too sure what exactly that means without example data nor DDL to
clarify. So guessing what is meant, perhaps the following; but as shown
extending to only one additional field, yet what can be done for one
additional field, can be repeated for each additional after that one:
drop table kf
;
create table kf
( k1 char
, f1 char(3) not null default ''
, f2 char(3) not null default ''
)
;
insert into kf values
('A', 'XXX', 'LLL')
, ('A', 'YYY', 'KKK')
, ('A', 'ZZZ', 'JJJ')
, ('B', 'XXX', 'KKK')
, ('C', 'MMM', 'PPP')
, ('C', 'ZZZ', 'NNN')
, ('D', 'XXX', ' ')
, ('E', 'UUU', 'JJJ')
, ('F', 'ZZZ', 'NNN')
, ('G', ' ', 'KKK')
, ('H', ' ', 'KKK')
;
Similarly described again, as the previous setup:
• Asterisks delimited row-data indicates that row has a value of F1
for which the maximum RRN() is found within the K1 grouping.
• Squiggly bracket delimited row-data indicates that row has a value
of F2 for which the maximum RRN() is found within the K1 grouping.
A XXX LLL
A YYY KKK
{*A ZZZ* JJJ}
{*B XXX* KKK}
C MMM PPP
{*C ZZZ* NNN}
*D XXX* blank
{*E UUU* JJJ}
{*F ZZZ* NNN}
{ G blank KKK}
{ H blank KKK}
then the counts should be for each of the columns:
F1 F2
--- ---
ZZZ 3 JJJ 2
XXX 2 NNN 2
UUU 1 KKK 3
So to include only the maximum count from each, that would be:
ZZZ 3 KKK 3
And to obtain only the count because the F# value is not relevant,
the following is the result set:
3 3
The following is a way to obtain the above result set [using
something valid on v5r3], but as row data instead of columns. There may
be a way to use the VALUES row clause and scalar fullselect instead of
what is shown here using effectively the same Nested Table Expressions
(NTE) [aka derived tables] but with an extra column as literal value
expression to designate whence the count value was derived, and the row
data collected in a UNION ALL. As noted, extending the query just
requires repeating the same patter for each additional column.
select F1.* from
( select count(*) as maxcnt
, 'F1' as c
from kf as P
where RRN(P) = ( select MAX(RRN(S)) from kf as S
where S.k1 = P.k1 )
and P.f1<>''
group by P.f1
order by maxcnt desc
fetch first row only ) as F1
union all
select F2.* from
( select count(*) as maxcnt
, 'F2' as c
from kf as P
where RRN(P) = ( select MAX(RRN(S)) from kf as S
where S.k1 = P.k1 )
and P.f2<>''
group by P.f2
order by maxcnt desc
fetch first row only ) as F2
/* additional union all queries precede the order by */
order by c /* or as desired */
Any help will be greatly appreciated.
There may be simpler and\or more succinct ways, but seems the above
might be functional for the given scenario, if I inferred correctly what
is the extended variation... and if row data is acceptable vs columns --
else revisions would be required, but probably none I could test,
limited to v5r3 capabilities.
As an Amazon Associate we earn from qualifying purchases.