|
A couple answers at once:
The previous file doesn't always start at 0 records. It contains
months worth of data. A previously processed record may be from the
same input file, or may not.
The previous file only contains the key fields to determine a
duplicate.
True, the majority of the time a duplicate is an exception to the
rule. However we do have clients that send us 40% duplicates (yes, I
know, sigh). If I were to do the Write and catch the Duplicate error
I'd have to also remove the job log entry (maybe not "have" to, but
100,000 job log messages about a duplicate write would be annoying).
This is where I wish Alison Butterill would recognize the need to
suppress a job log message when monitoring for an error. Sure, I can
do it with an API, I know (and have).
The Previous file, in this case, is gb's in size. We don't currently
have enough Main storage to accommodate, much less to allow multiple
jobs (using different files) to process at once. We are looking at
getting more memory.
Select Distinct may be a good idea for eliminating duplicates within a
single input file. I'll definitely consider that.
We are blocking the read on the input file. However, the size of the
Previous file is really the issue here. Empty the Previous file, and
the program flies. Fill up the Previous file, and the program grinds.
The Input & Previous file will never have deleted records in them.
Thanks for all of the responses,
Kurt
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.