|
Okay folks,
I've had a bit of a breakthrough, so I thought I'd share it with you all. The
task is to migrate data from an earlier version of a file to a newer version of
the same file. A "database upgrade".
Here is additional detail as context.
Our machine is a 720 2-way at V5R1.
record count = 3,550,309
record length = "from" definition, 1650 bytes, "to" definition = 1862 bytes -
relatively large record size
Unique key on PF = 1st 4 fields, 53 total bytes, all alpha
No logical files attached, so the only access path is the UNIQUE one on the PF.
Here are my run times (clock) for other methods:
CPYF FMTOPT(*MAP *DROP) - 2 hours, 57 minutes
CHGPF - 2 hours, 33 minutes
CPYF FMTOPT(*MAP *DROP) with record ranges running in parallel - 1 hour, 21
minutes
new approach described below - 23.5 minutes
Here's the "CPYF parallel" approach. I retrieve the number of records in the
file, and divide that by the number of parallel jobs I want to submit, giving
me the "increment". I then submit CPYF FMTOPT(*MAP *DROP) with FROMRCD(1)
TORCD(increment), increasing by the increment till end of file. (I actually
work backwards from *END to 1, but the idea is the same.) Since our files are
REUSEDLT(*YES) I don't worry about large RRN gaps due to deleted records.
Apparently the (*MAP *DROP) processing is rough on the CPU, because there was
no difference in clock time between 4-jobs-wide and 50-jobs-wide, but CPU
utilization was nearly 100%. Several different widths from 4 to 50 all ran
within 2 minutes of each other (clock time).
Here's what seems to be the fastest approach. Below is a CL pgm that performs
the work. JOINFILE is an empty file in the same format as the target file
(updated definition). This supplies the new field definitions to the output
format.
pgm
OPNQRYF FILE((TESTLIB1/ORIGFILE) (GAPFILE)) +
FORMAT(QTEMP/JOINFILE) JFLD((1/KEY1 +
2/KEY1) (1/KEY2 2/KEY2) (1/KEY3 +
2/KEY3) (1/KEY4 2/KEY4)) +
JDFTVAL(*YES) OPNID(JOINOPEN) +
SEQONLY(*YES 108)
OVRDBF FILE(NEWFILE) SEQONLY(*YES 108)
CPYFRMQRYF FROMOPNID(JOINOPEN) +
TOFILE(TESTLIB2/NEWFILE) MBROPT(*REPLACE)
return
endpgm
I created a "gap" file which is an _empty_ PF containing the key fields and the
"new" fields only. (In other words, just the fields added to the ORIGFILE
format.) Joined on the keys, selecting JDFTVAL(*YES) to get every record in the
original file. Then simply CPYFRMQRYF from the join to an actual PF in the new
format.
While I can't really explain why this is so much faster, or why the parallel
stuff didn't give the return I wanted, I'm thrilled with the result.
Regards,
Michael Polutta
Atlanta, GA
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.