Jim Essinger wrote:
I have a file that contains 1 field called LINE that is 186 
characters long, and contains spool data. In about 900 cases,
(of 25,000+ lines) one particular line gets
split into 2 lines like:
  2           1234.56-
  0                                             798.00
I need to have them combined into one line like:
  2           1234.56-                    798.00
  Rather than updating flat file data originating from a spooled 
report, why not just get the original data from the same place [most 
likely from a described database file] whence whatever reporting 
tool had generated that spooled output?  That is, why not just 
perform the necessary SQL against the original data, rather than 
running some SQL against a spooled report created from that data? 
Or if absolutely required to parse the spooled [flat file] data 
[e.g. such that the aim is] to correct spooled data to rid of 
apparent line wrap, would it not be more appropriate to just change 
the report generation so as not to wrap the lines in the first place?
The data is all numbers, edited. The 2nd line is always a
zero in position 4 of the line, and one RRN later in the
file.  What I want to do is to update the first line with
both parts of the data, and then I will remove the "Zero"
record.
  The position four was noted as the location of the zero, but the 
LIKE test in the later quoted SELECT will find the rows with the 
zero in position two.  An error?
I have created an SQL statement that will correctly combine
the  two pieces into one line. Now I need to use that
statement to update the correct records.
That is where my mind  has failed me.
  Given that the data is not relational, perhaps consider not even 
using the SQL UPDATE.  FWiW since ~1/28 of the rows will be updated 
in a fairly small file, IMO it would seem much more appropriate to 
just perform the changes with RLA\native I/O with arrival sequence 
[or less desirable using SQL cursor(s) with the less efficient 
retrieval & order by RRN() to effect the same] by reading every row 
for update; i.e. delete the current row & make the change to the 
previous row with the data obtained from what was the current row. 
The query is going to do a full table scan anyhow, actually two, so 
there is really just overhead to use a query versus RLA.
The select statement that puts together the 2 lines is:
select rrn(a)
  , substr(a.LINE, 1, String_str - 1) concat
    substr(c.LINE, string_str)
from    MyFile a
join (
   select
    rrn(b) - 1 as Join_rrn
   ,locate('Z', substr(b.LINE, 1, 4) concat
                translate(substr(b.line,5 , length(b.LINE)-1),
                      'Z','.0123456789-','Z'))  as String_Str
   ,b.LINE
   from MyFile b
   where line like ' 0%'
      ) c
on rrn(a) = join_rrn
The subselect translates all editing and numbers to a "Z" and 
then locates the first occurrence of "Z" to find the starting of 
the string on the second line. The file is joined to itself based
on RRN and the line containing a zero in position 4.
1st - Is there an easier way I could code this statement?
  Locating a 'Z' in the first four bytes may be problematic if 
there is a 'Z' in the third or fourth byte.?  Perhaps the LOCATE 
should have a /start/ position of five; but that is only valid if 
always non-zero for the result, else a CASE could determine when to 
add four.
  Probably the statement can be further improved, but the sample 
set of data is too small; i.e. unclear what special cases might 
exist that would break a simpler query.  The given query strips any 
data between the zero and the first numeric digit, period, or minus 
symbol.  Given the scenario is a spooled report, maybe stripping all 
of that data is unnecessary, such that the complex expression could 
be replaced with something much simpler like one substring.  But if 
there is a 'Z', period, or a dash\minus somewhere in that text, then 
that will locate a 'Z' somewhere other than the desired numeric 
value position.  If the data between the zero and the numeric value 
is always blank, the statement could be greatly simplified.
  I would expect that the more likely scenario is that all of the 
data after the zero is always blanks until the desired numeric 
value, and that the data is in fixed positions.  If so, then just 
selecting something like the substr(b.line, 5) to concatenate with 
substr(a.line, 1, 22) would suffice?  Perhaps the substring should 
be from position three instead of from position five?  Anyhow using 
that VarChar result in the NTE reduces the amount of data for that 
which is concatenated in the outer SELECT, plus removes any 
requirement for position\locate [i.e. the string_str is no longer 
required].  As truly valid string-form numeric values, [implicit] 
cast to decimal makes alignment and calculations\summaries a 
non-issue, if that may be desirable.
2nd - How can I use this statement in an UPDATE to change the 
correct line using RRN as the key?
  I would encapsulate the RRN() in a VIEW; probably two separate 
VIEWs where the first is update capable for the LINE column and 
which selects only the primary lines, with the second VIEW selecting 
only the secondary lines and including the start position as a 
column.  After the UPDATE, issue the DELETE against the second VIEW 
without any selection.
  Consider the following [may be some editing errors] scenario:
<code>
  create table daft (line char 186)
  ;
  insert into  daft values
  /* ....+....1....+....2....+....3....+....4....+....5....
   (' 2           1111.11-                          ')
  ,(' 0                                               1.00')
  ,(' 2         222222.22-                     22.00')
  ,(' 2           1234.56-                          ')
  ,(' 0                                             798.00')
  ,(' 2              3.33-                    333.00')
  ,(' 2             44.44-                          ')
  ,(' 0                                            4444.00')
  ,(' 2            555.55-                          ')
  ,(' 0                                           55555.00')
  ,(' 2              3.33-                   5399.10')
  ,(' 0                               5432.10 <-----------')
  ,(' 2              3.33-                   1234.56')
  ,(' 0                               6666.66 <-----------')
  ,(' 2          77777.77-                          ')
  ,(' 0      this text is dropped                    77.77')
 ;
 create view daftline1 ( rrn1 , line) as
 (select rrn(a),line from daft a
  where line like ' 2%')
 ;
 create view daftline2 ( rrn2 , line, tokloc) as
 (select rrn(a)-1, line
  , locate('Z'
          , substr(line, 1, 4) concat
                   translate(substr(line, 5, length(line)-4)
                            , 'Z', '.0123456789-', 'Z')
       /* , 5 */
          )
 from daft a where line like ' 0%')
 ;
  select substr(a.LINE, 1, tokloc -1)
      concat substr(b.LINE,tokloc)
  from       daftline1 a
    join     daftline2 b
      on rrn1=rrn2
 ;
 update daftline1 u set u.line = (
    select substr(a.line, 1, tokloc - 1) concat
           substr(b.line, tokloc)
    from       daftline1 a
          join daftline2 b
             on a.rrn1=b.rrn2
            and a.rrn1=u.rrn1    )
 where u.rrn1 in (select rrn2 from daftline2)
 ;
 delete from daftline2 /* has: where line like ' 0%' */
 ;
 select * from daft
 ; /* output: */
 LINE
  2           1111.11-                            1.00
  2         222222.22-                     22.00
  2           1234.56-                          798.00
  2              3.33-                    333.00
  2             44.44-                         4444.00
  2            555.55-                        55555.00
  2              3.33-            5432.10 <-----------
  2              3.33-            6666.66 <-----------
  2          77777.77-                           77.77
 ********  End of data  ********
 /* The alignment issue is side effect of given sample */
 /* Unclear what five bytes may be ignored\truncated   */
</code>
  The above UPDATE statement should be able to be modified to 
reference the TABLE directly, instead of referencing the VIEWs if 
one were so inclined; e.g. using a NTE for the joined files.  I can 
not verify the variation I used on a release where the modified 
statement might work and if not to be supported; i.e. correlation or 
-811 errors on v5r3.
Regards, Chuck
 
As an Amazon Associate we earn from qualifying purchases.