Re: Sql update of a file based on the same file -- MIDRANGE-L

Jim Essinger wrote:

I have a file that contains 1 field called LINE that is 186 characters long, and contains spool data. In about 900 cases,
(of 25,000+ lines) one particular line gets
split into 2 lines like:

2 1234.56-
0 798.00

I need to have them combined into one line like:

2 1234.56- 798.00

Rather than updating flat file data originating from a spooled report, why not just get the original data from the same place [most likely from a described database file] whence whatever reporting tool had generated that spooled output? That is, why not just perform the necessary SQL against the original data, rather than running some SQL against a spooled report created from that data? Or if absolutely required to parse the spooled [flat file] data [e.g. such that the aim is] to correct spooled data to rid of apparent line wrap, would it not be more appropriate to just change the report generation so as not to wrap the lines in the first place?

The data is all numbers, edited. The 2nd line is always a
zero in position 4 of the line, and one RRN later in the
file. What I want to do is to update the first line with
both parts of the data, and then I will remove the "Zero"
record.

The position four was noted as the location of the zero, but the LIKE test in the later quoted SELECT will find the rows with the zero in position two. An error?

I have created an SQL statement that will correctly combine
the two pieces into one line. Now I need to use that
statement to update the correct records.
That is where my mind has failed me.

Given that the data is not relational, perhaps consider not even using the SQL UPDATE. FWiW since ~1/28 of the rows will be updated in a fairly small file, IMO it would seem much more appropriate to just perform the changes with RLA\native I/O with arrival sequence [or less desirable using SQL cursor(s) with the less efficient retrieval & order by RRN() to effect the same] by reading every row for update; i.e. delete the current row & make the change to the previous row with the data obtained from what was the current row. The query is going to do a full table scan anyhow, actually two, so there is really just overhead to use a query versus RLA.

The select statement that puts together the 2 lines is:

select rrn(a)
, substr(a.LINE, 1, String_str - 1) concat
substr(c.LINE, string_str)
from MyFile a
join (
select
rrn(b) - 1 as Join_rrn
,locate('Z', substr(b.LINE, 1, 4) concat
translate(substr(b.line,5 , length(b.LINE)-1),
'Z','.0123456789-','Z')) as String_Str
,b.LINE
from MyFile b
where line like ' 0%'
) c
on rrn(a) = join_rrn

The subselect translates all editing and numbers to a "Z" and then locates the first occurrence of "Z" to find the starting of the string on the second line. The file is joined to itself based
on RRN and the line containing a zero in position 4.

1st - Is there an easier way I could code this statement?

Locating a 'Z' in the first four bytes may be problematic if there is a 'Z' in the third or fourth byte.? Perhaps the LOCATE should have a /start/ position of five; but that is only valid if always non-zero for the result, else a CASE could determine when to add four.

Probably the statement can be further improved, but the sample set of data is too small; i.e. unclear what special cases might exist that would break a simpler query. The given query strips any data between the zero and the first numeric digit, period, or minus symbol. Given the scenario is a spooled report, maybe stripping all of that data is unnecessary, such that the complex expression could be replaced with something much simpler like one substring. But if there is a 'Z', period, or a dash\minus somewhere in that text, then that will locate a 'Z' somewhere other than the desired numeric value position. If the data between the zero and the numeric value is always blank, the statement could be greatly simplified.

I would expect that the more likely scenario is that all of the data after the zero is always blanks until the desired numeric value, and that the data is in fixed positions. If so, then just selecting something like the substr(b.line, 5) to concatenate with substr(a.line, 1, 22) would suffice? Perhaps the substring should be from position three instead of from position five? Anyhow using that VarChar result in the NTE reduces the amount of data for that which is concatenated in the outer SELECT, plus removes any requirement for position\locate [i.e. the string_str is no longer required]. As truly valid string-form numeric values, [implicit] cast to decimal makes alignment and calculations\summaries a non-issue, if that may be desirable.

2nd - How can I use this statement in an UPDATE to change the correct line using RRN as the key?

I would encapsulate the RRN() in a VIEW; probably two separate VIEWs where the first is update capable for the LINE column and which selects only the primary lines, with the second VIEW selecting only the secondary lines and including the start position as a column. After the UPDATE, issue the DELETE against the second VIEW without any selection.

Consider the following [may be some editing errors] scenario:

<code>

create table daft (line char 186)
;
insert into daft values
/* ....+....1....+....2....+....3....+....4....+....5....
(' 2 1111.11- ')
,(' 0 1.00')
,(' 2 222222.22- 22.00')
,(' 2 1234.56- ')
,(' 0 798.00')
,(' 2 3.33- 333.00')
,(' 2 44.44- ')
,(' 0 4444.00')
,(' 2 555.55- ')
,(' 0 55555.00')
,(' 2 3.33- 5399.10')
,(' 0 5432.10 <-----------')
,(' 2 3.33- 1234.56')
,(' 0 6666.66 <-----------')
,(' 2 77777.77- ')
,(' 0 this text is dropped 77.77')
;
create view daftline1 ( rrn1 , line) as
(select rrn(a),line from daft a
where line like ' 2%')
;
create view daftline2 ( rrn2 , line, tokloc) as
(select rrn(a)-1, line
, locate('Z'
, substr(line, 1, 4) concat
translate(substr(line, 5, length(line)-4)
, 'Z', '.0123456789-', 'Z')
/* , 5 */
)
from daft a where line like ' 0%')
;
select substr(a.LINE, 1, tokloc -1)
concat substr(b.LINE,tokloc)
from daftline1 a
join daftline2 b
on rrn1=rrn2
;
update daftline1 u set u.line = (
select substr(a.line, 1, tokloc - 1) concat
substr(b.line, tokloc)
from daftline1 a
join daftline2 b
on a.rrn1=b.rrn2
and a.rrn1=u.rrn1 )
where u.rrn1 in (select rrn2 from daftline2)
;
delete from daftline2 /* has: where line like ' 0%' */
;
select * from daft
; /* output: */
LINE
2 1111.11- 1.00
2 222222.22- 22.00
2 1234.56- 798.00
2 3.33- 333.00
2 44.44- 4444.00
2 555.55- 55555.00
2 3.33- 5432.10 <-----------
2 3.33- 6666.66 <-----------
2 77777.77- 77.77
******** End of data ********
/* The alignment issue is side effect of given sample */
/* Unclear what five bytes may be ignored\truncated */

</code>

The above UPDATE statement should be able to be modified to reference the TABLE directly, instead of referencing the VIEWs if one were so inclined; e.g. using a NTE for the joined files. I can not verify the variation I used on a release where the modified statement might work and if not to be supported; i.e. correlation or -811 errors on v5r3.

Regards, Chuck