That's only 800 megabytes ...

1. if the file is already sorted, you're done :)

2. read Knuth - you already knew that.  I love his comment (paraphrasing)
"the sort algorithm you came up with is either one of the seven listed here
or a completely new super technique, please contact the author immediately!"

3. Use the reformat utility on the 400 - Dick Bains will tell you that he
created the buffering and it is pretty good.  He liked SETACST.  The more
CPUs and memory and dasd bandwidth the better.

4. Sort it somewhere else - a nice Teradata or IBM NUMA comes to mind.

The tricky part of your question is your definition of "most efficiently".
The thing that has me most concerned are "the parts left out" -- how big
keys, how many keys, how often do you need to repeat this, are you
inserting, updating, and deleting data in the file, languages, character
ordering -- all that kind of stuff.  Your sense of humor has not escaped me.

If this isn't a trick question and you plan to sort it on a 400 and you have
the luxury of a few minutes, you might let SQL sort it.  I have measured SQL
sorting more data than this in a few minutes - only 6 million rows but each
was 1,800 bytes long and the keys were 50-bytes long.  The traffic to and
from the disk drives was significant but not lethal, the memory requirement
made SMPO0001 run pretty hard because it was sorting in a 10 GB pool that
was under pretty heavy pressure - there was a great deal of paging - the
temporary spaces that were part of the sort and those flushed out of the
pool were backed to RAID disk to that made the drives run a little more
slowly than you might like.  But overall, it worked pretty well.  This was
on a 12-way Northstar with QQRYDEGREE set up to perform parallel reads for
table scans.  If the machine had the SMP Query product, it might sort pretty
fast but it would hog all the memory and CPUs and that would hammer the disk
drives.

That really isn't that much data unless you plan to write your own full
insertion sort.  In that event, can I come and watch?  The run time for full
insertion is N^2 if I remember correctly ... something times 10^14 :)

Richard Jackson
mailto:richardjackson@richardjackson.net
http://www.richardjacksonltd.com
Voice: 1 (303) 808-8058
Fax:   1 (303) 663-4325

-|-----Original Message-----
-|From: owner-midrange-l@midrange.com
-|[mailto:owner-midrange-l@midrange.com]On Behalf Of Leif Svalgaard
-|Sent: Wednesday, October 04, 2000 1:11 PM
-|To: List Midrange
-|Subject: Sorting large file
-|
-|
-|Folks,
-|
-|I have a sequential file of 10,000,000 80-character text records.
-|How do I sort the file the most efficiently?
-|
-|Leif
-|
-|
-|+---
-|| This is the Midrange System Mailing List!
-|| To submit a new message, send your mail to MIDRANGE-L@midrange.com.
-|| To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com.
-|| To unsubscribe from this list send email to
-|MIDRANGE-L-UNSUB@midrange.com.
-|| Questions should be directed to the list owner/operator:
-|david@midrange.com
-|+---

+---
| This is the Midrange System Mailing List!
| To submit a new message, send your mail to MIDRANGE-L@midrange.com.
| To subscribe to this list send email to MIDRANGE-L-SUB@midrange.com.
| To unsubscribe from this list send email to MIDRANGE-L-UNSUB@midrange.com.
| Questions should be directed to the list owner/operator: david@midrange.com
+---

As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.