I think the midrange list (which this was originally posted to) would
have been a more appropriate venue for this, but since the RPG list is
where people seem to have responded, I guess it's stuck here.

My question is: Is this a one-time thing, or will it be recurring?

The reason I ask is that if it's a one-time thing, then in your shoes,
I would preprocess the huge XML files on my PC. In my experience, for
many tasks, my PC is quite a bit faster than our i. This is not a
knock on the i at all, because the i handles multiple users and heavy
disk I/O loads quite well, whereas my PC only has to handle one user.

So in this case, I would use tools on my PC (such as Python, but also
conceivably other software) to transform the XML into either smaller,
more manageable pieces, or a more manageable transport format, or
(even more likely) just insert the final, parsed data directly into
the database.

Regardless of whether you're sticking to RPG or open to alternatives,
one thing to keep in mind is that SAX-style parsing is generally
better suited to very large files, because it only needs to work with
a small chunk of the file at a time; whereas DOM-style parsing (as
exemplified by XML-INTO) needs to work with the whole file basically
as a unit. When DOM parsing works, it is generally faster than SAX.
But maybe you've reached the threshold where DOM is impractical. At a
minimum, SAX will more easily allow you to monitor your progress. If
you try to load the whole thing at once with DOM, and you quit after 3
hours, you have no idea whether just waiting 10 minutes more would
have allowed it to complete, or whether it still wouldn't have
finished after another 3 hours.

John Y.

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.