Jim Essinger wrote:
I have a file that contains 1 field called LINE that is 186
characters long, and contains spool data. In about 900 cases,
(of 25,000+ lines) one particular line gets
split into 2 lines like:
2 1234.56-
0 798.00
I need to have them combined into one line like:
2 1234.56- 798.00
Rather than updating flat file data originating from a spooled
report, why not just get the original data from the same place [most
likely from a described database file] whence whatever reporting
tool had generated that spooled output? That is, why not just
perform the necessary SQL against the original data, rather than
running some SQL against a spooled report created from that data?
Or if absolutely required to parse the spooled [flat file] data
[e.g. such that the aim is] to correct spooled data to rid of
apparent line wrap, would it not be more appropriate to just change
the report generation so as not to wrap the lines in the first place?
The data is all numbers, edited. The 2nd line is always a
zero in position 4 of the line, and one RRN later in the
file. What I want to do is to update the first line with
both parts of the data, and then I will remove the "Zero"
record.
The position four was noted as the location of the zero, but the
LIKE test in the later quoted SELECT will find the rows with the
zero in position two. An error?
I have created an SQL statement that will correctly combine
the two pieces into one line. Now I need to use that
statement to update the correct records.
That is where my mind has failed me.
Given that the data is not relational, perhaps consider not even
using the SQL UPDATE. FWiW since ~1/28 of the rows will be updated
in a fairly small file, IMO it would seem much more appropriate to
just perform the changes with RLA\native I/O with arrival sequence
[or less desirable using SQL cursor(s) with the less efficient
retrieval & order by RRN() to effect the same] by reading every row
for update; i.e. delete the current row & make the change to the
previous row with the data obtained from what was the current row.
The query is going to do a full table scan anyhow, actually two, so
there is really just overhead to use a query versus RLA.
The select statement that puts together the 2 lines is:
select rrn(a)
, substr(a.LINE, 1, String_str - 1) concat
substr(c.LINE, string_str)
from MyFile a
join (
select
rrn(b) - 1 as Join_rrn
,locate('Z', substr(b.LINE, 1, 4) concat
translate(substr(b.line,5 , length(b.LINE)-1),
'Z','.0123456789-','Z')) as String_Str
,b.LINE
from MyFile b
where line like ' 0%'
) c
on rrn(a) = join_rrn
The subselect translates all editing and numbers to a "Z" and
then locates the first occurrence of "Z" to find the starting of
the string on the second line. The file is joined to itself based
on RRN and the line containing a zero in position 4.
1st - Is there an easier way I could code this statement?
Locating a 'Z' in the first four bytes may be problematic if
there is a 'Z' in the third or fourth byte.? Perhaps the LOCATE
should have a /start/ position of five; but that is only valid if
always non-zero for the result, else a CASE could determine when to
add four.
Probably the statement can be further improved, but the sample
set of data is too small; i.e. unclear what special cases might
exist that would break a simpler query. The given query strips any
data between the zero and the first numeric digit, period, or minus
symbol. Given the scenario is a spooled report, maybe stripping all
of that data is unnecessary, such that the complex expression could
be replaced with something much simpler like one substring. But if
there is a 'Z', period, or a dash\minus somewhere in that text, then
that will locate a 'Z' somewhere other than the desired numeric
value position. If the data between the zero and the numeric value
is always blank, the statement could be greatly simplified.
I would expect that the more likely scenario is that all of the
data after the zero is always blanks until the desired numeric
value, and that the data is in fixed positions. If so, then just
selecting something like the substr(b.line, 5) to concatenate with
substr(a.line, 1, 22) would suffice? Perhaps the substring should
be from position three instead of from position five? Anyhow using
that VarChar result in the NTE reduces the amount of data for that
which is concatenated in the outer SELECT, plus removes any
requirement for position\locate [i.e. the string_str is no longer
required]. As truly valid string-form numeric values, [implicit]
cast to decimal makes alignment and calculations\summaries a
non-issue, if that may be desirable.
2nd - How can I use this statement in an UPDATE to change the
correct line using RRN as the key?
I would encapsulate the RRN() in a VIEW; probably two separate
VIEWs where the first is update capable for the LINE column and
which selects only the primary lines, with the second VIEW selecting
only the secondary lines and including the start position as a
column. After the UPDATE, issue the DELETE against the second VIEW
without any selection.
Consider the following [may be some editing errors] scenario:
<code>
create table daft (line char 186)
;
insert into daft values
/* ....+....1....+....2....+....3....+....4....+....5....
(' 2 1111.11- ')
,(' 0 1.00')
,(' 2 222222.22- 22.00')
,(' 2 1234.56- ')
,(' 0 798.00')
,(' 2 3.33- 333.00')
,(' 2 44.44- ')
,(' 0 4444.00')
,(' 2 555.55- ')
,(' 0 55555.00')
,(' 2 3.33- 5399.10')
,(' 0 5432.10 <-----------')
,(' 2 3.33- 1234.56')
,(' 0 6666.66 <-----------')
,(' 2 77777.77- ')
,(' 0 this text is dropped 77.77')
;
create view daftline1 ( rrn1 , line) as
(select rrn(a),line from daft a
where line like ' 2%')
;
create view daftline2 ( rrn2 , line, tokloc) as
(select rrn(a)-1, line
, locate('Z'
, substr(line, 1, 4) concat
translate(substr(line, 5, length(line)-4)
, 'Z', '.0123456789-', 'Z')
/* , 5 */
)
from daft a where line like ' 0%')
;
select substr(a.LINE, 1, tokloc -1)
concat substr(b.LINE,tokloc)
from daftline1 a
join daftline2 b
on rrn1=rrn2
;
update daftline1 u set u.line = (
select substr(a.line, 1, tokloc - 1) concat
substr(b.line, tokloc)
from daftline1 a
join daftline2 b
on a.rrn1=b.rrn2
and a.rrn1=u.rrn1 )
where u.rrn1 in (select rrn2 from daftline2)
;
delete from daftline2 /* has: where line like ' 0%' */
;
select * from daft
; /* output: */
LINE
2 1111.11- 1.00
2 222222.22- 22.00
2 1234.56- 798.00
2 3.33- 333.00
2 44.44- 4444.00
2 555.55- 55555.00
2 3.33- 5432.10 <-----------
2 3.33- 6666.66 <-----------
2 77777.77- 77.77
******** End of data ********
/* The alignment issue is side effect of given sample */
/* Unclear what five bytes may be ignored\truncated */
</code>
The above UPDATE statement should be able to be modified to
reference the TABLE directly, instead of referencing the VIEWs if
one were so inclined; e.g. using a NTE for the joined files. I can
not verify the variation I used on a release where the modified
statement might work and if not to be supported; i.e. correlation or
-811 errors on v5r3.
Regards, Chuck
As an Amazon Associate we earn from qualifying purchases.