|
That's only 800 megabytes ... 1. if the file is already sorted, you're done :) 2. read Knuth - you already knew that. I love his comment (paraphrasing) "the sort algorithm you came up with is either one of the seven listed here or a completely new super technique, please contact the author immediately!" 3. Use the reformat utility on the 400 - Dick Bains will tell you that he created the buffering and it is pretty good. He liked SETACST. The more CPUs and memory and dasd bandwidth the better. 4. Sort it somewhere else - a nice Teradata or IBM NUMA comes to mind. The tricky part of your question is your definition of "most efficiently". The thing that has me most concerned are "the parts left out" -- how big keys, how many keys, how often do you need to repeat this, are you inserting, updating, and deleting data in the file, languages, character ordering -- all that kind of stuff. Your sense of humor has not escaped me. If this isn't a trick question and you plan to sort it on a 400 and you have the luxury of a few minutes, you might let SQL sort it. I have measured SQL sorting more data than this in a few minutes - only 6 million rows but each was 1,800 bytes long and the keys were 50-bytes long. The traffic to and from the disk drives was significant but not lethal, the memory requirement made SMPO0001 run pretty hard because it was sorting in a 10 GB pool that was under pretty heavy pressure - there was a great deal of paging - the temporary spaces that were part of the sort and those flushed out of the pool were backed to RAID disk to that made the drives run a little more slowly than you might like. But overall, it worked pretty well. This was on a 12-way Northstar with QQRYDEGREE set up to perform parallel reads for table scans. If the machine had the SMP Query product, it might sort pretty fast but it would hog all the memory and CPUs and that would hammer the disk drives. That really isn't that much data unless you plan to write your own full insertion sort. In that event, can I come and watch? The run time for full insertion is N^2 if I remember correctly ... something times 10^14 :) Richard Jackson mailto:richardjackson@richardjackson.net http://www.richardjacksonltd.com Voice: 1 (303) 808-8058 Fax: 1 (303) 663-4325 -|-----Original Message----- -|From: owner-midrange-l@midrange.com -|[mailto:owner-midrange-l@midrange.com]On Behalf Of Leif Svalgaard -|Sent: Wednesday, October 04, 2000 1:11 PM -|To: List Midrange -|Subject: Sorting large file -| -| -|Folks, -| -|I have a sequential file of 10,000,000 80-character text records. -|How do I sort the file the most efficiently? -| -|Leif -| -| -|+--- -|| This is the Midrange System Mailing List! -|| To submit a new message, send your mail to MIDRANGE-L@midrange.com. -|| To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com. -|| To unsubscribe from this list send email to -|MIDRANGE-L-UNSUB@midrange.com. -|| Questions should be directed to the list owner/operator: -|david@midrange.com -|+--- +--- | This is the Midrange System Mailing List! | To submit a new message, send your mail to MIDRANGE-L@midrange.com. | To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com. | To unsubscribe from this list send email to MIDRANGE-L-UNSUB@midrange.com. | Questions should be directed to the list owner/operator: david@midrange.com +---
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.