The thread offered some examples of confusing, mixed format data, which makes parsing problematic. Many years ago I had to do this (in RPG II no less). The classic problem name that we had to consider was Jan Van Breda Koff ( an all SEC guard at the time). If you aren't familiar with the name, know that Van Breda Koff is the dude's last name.
The tapes that we received had names with delimiters (sometimes a comma, sometimes a slash) and names without delimiters. Fortunately most of the names on a single tape were known to be in either first-name first, or last-name first. Analysis of the data took a couple of weeks. Then we had to define logic that tried to determine the format of each names (Is there a non-character in the field? Then that's a delimiter.), then define algorithms for each case. But then we had humans go over the list (James Rich, Rich James?).

So-called modern technology hasn't helped a bit in this situation. My mother got a mass mailing last week addressed to "Mrs. C. Adams". Was that stripped from my middle initial for targeting the "woman of the house?" How much one spends of defining the rules and post-conversion clean-up will depend upon how critical the data is. A mass mailer will try to eliminate duplicates automatically to save postage, but anything more is probably spending a dime to save a nickel (i.e., not worth the effort). For other purposes, it may be worth the cost to scrub the data more thoroughly.

* Jerry C. Adams
*IBM System i Programmer/Analyst
B&W Wholesale Distributors, Inc.* *
voice
615.995.7024
fax
615.995.1201
email
jerry@xxxxxxxxxxxxxxx <mailto:jerry@xxxxxxxxxxxxxxx>



Wilt, Charles wrote:
Here's a nice thread discussing the difficulty of "Parsing Names" with any language....

http://archive.midrange.com/rpg400-l/200601/msg00328.html


Charles Wilt --
Software Engineer
CINTAS Corporation - IT 92B
513.701.1307

wiltc@xxxxxxxxxx


-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-
bounces@xxxxxxxxxxxx] On Behalf Of Wilt, Charles
Sent: Tuesday, April 29, 2008 4:52 PM
To: Midrange Systems Technical Discussion
Subject: RE: Parsing Names...

What do you mean "various formats"?

Do you have a fixed set of rules to follow?

Depending on the rules, RPGLE could handle the job nicely using the
%scan() and %subst() built in
functions.

IF the string processing in RPG isn't enough, you can consider calling
APIs to use Regular Expressions
(regex) from RPG or use a language like Java or REXX(?) that has built in
regex processing.

HTH,

Charles Wilt
--
Software Engineer
CINTAS Corporation - IT 92B
513.701.1307

wiltc@xxxxxxxxxx


-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-
bounces@xxxxxxxxxxxx] On Behalf Of David Turnidge
Sent: Tuesday, April 29, 2008 4:20 PM
To: Midrange Systems Technical Discussion
Subject: Parsing Names...

I have just been tasked with taking Excel files that contain full names
-
in
various formats - and I am to break them into first, middle and last.

Years ago I did a lot of this in Foxpro, but I haven't done that for
longer
than I can remember. So, I'm looking for something in RPG that could do
the
job.

Other than that, I might be able to mess with something in Basic <fear
in
eyes>...

Any recommendations appreciated - Thank you in advance.

Dave Turnidge
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.


This e-mail transmission contains information that is intended to be
confidential and privileged. If you receive this e-mail and you are not a
named addressee you are hereby notified that you are not authorized to
read, print, retain, copy or disseminate this communication without the
consent of the sender and that doing so is prohibited and may be unlawful.
Please reply to the message immediately by informing the sender that the
message was misdirected. After replying, please delete and otherwise
erase it and any attachments from your computer system. Your assistance
in correcting this error is appreciated.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list
To post a message email: MIDRANGE-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.




This e-mail transmission contains information that is intended to be confidential and privileged. If you receive this e-mail and you are not a named addressee you are hereby notified that you are not authorized to read, print, retain, copy or disseminate this communication without the consent of the sender and that doing so is prohibited and may be unlawful. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please delete and otherwise erase it and any attachments from your computer system. Your assistance in correcting this error is appreciated.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.