|
Jeff Crosby wrote:
I have never come upon a complete foolproof set of rules to do this right.
Even humans can't do it right. How can anyone decide whether it's Macdonell or MacDonell? Only the owner of the name knows for sure. Mixed-case to upper-case is a many-to-one function. It's impossible to get round-tripping on a many-to-one function. (A similar issue comes up if you only store the last 6 digits of a date, and try to guess at the first 2. No matter how clever you try to be, you'll get it wrong sometimes. Birthdate of 99/12/23? Must be 1999; that 107-year-old must be dead by now.) The only way to get this right is not to lose the mixed-case form in the first place. If it's not feasible just to store the name in mixed-case and upper-case it for comparisons, you could store the mixed-case version of the name somewhere else for the problem cases. To determine the problem cases, use a trivial conversion routine of uppercasing the first character of each word, and if you don't get correct round-tripping from mixed to upper back to mixed, you have the problem. Then you could store the mixed-case version somewhere else (add varying length field that is set to '' if the trivial conversion works, or add a flag to say "look for mixed-case-name in other file"). Probably only a tiny fraction will be problem cases.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.