On 20-Jul-2016 14:58 -0500, Buck Calabro wrote:
On 7/20/2016 1:40 PM, Fuchs, James M wrote:
Below is the contents of the file I am trying to process.
The Field separator is an asterisk....
<<SNIP>>
<<SNIPped sample input data>>
Thank you for this <ed: snipped sample-data>. I did some testing in
PASE (call qp2term). I don't know what vintage my AWK is. I had to do
2 things to this to get it to run, and I don't really understand
why.
1) In the AWK script, change the line endings to LF instead of CRLF
IBM i->IBM i 7.1->Programming->IBM PASE for i->Preparing programs to
run in IBM PASE for i->Copying the IBM PASE for i program to your
system->Line-terminating characters in integrated file system files
[
http://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_71/rzalf/rzalflineterm.htm]
2) In the incoming file, remove the line endings entirely
call qp2term
$
awk -f awktest.awk awktest.txt awkout.txt
$
cat awkout.txt
<<SNIPped output received>>
Given those files as UTF8 data, both the above <snipped> data and the
awk script from [
http://pastebin.com/qepNCBsG], when run without any
changes on my Mac, no output is produced. And that is the same as on
the IBM i; AIUI, as also confirmed by both the OP and Buck, in their
tests on IBM i.
For the OP, and perhaps for anyone also who looked or planned to look
at the issue, I offer, as FWiW:
Given my files were created on the Mac and sent via FTP from a Mac to
the IBM i, the [default] effect was having the LineFeed (LF) as record
delimiter, in both the input file and the script file that I used.
I verified that when running from QSH, with both files having a CCSID
tagging of 1252 and with data to match [best I can tell], that after the
following revisions were made to those files [different changes than
Buck's, though still requires stripping some record-ending characters],
I get the same above noted results <snipped>, as reported by Buck:
• script modified as noted above in (1) for just LF vs CRLF; for me,
as noted, that was implicit, as I did not have to take any overt action.
• script revised so both assignments of RS are only\always: RS =
"\n"; # this nullifies the [lack of] specification of the fourth arg
• inputfile file modified, but unlike noted above in (2), solely to
have the LF character as end-of-record, just as with the script file
• inputfile revised further, to remove all of the tilde characters
at the end of every line\record
I also verified the same positive results after those same changes
were made, then copied back to my Mac, then run in bash on my Mac; that
was true, whether re-saving both files as UTF-8 or as apparent
CCSID-1252. So apparently both Mac and IBMi are similarly /broken/ as
contrasted with the reported effects by the OP for use of that
script+data on Windows; in a potentially positive light, at least there
seems some consistency between the IBMi and the Mac ;-)
Thus I infer the primary problem is with the value being assigned to
the Record Separator (RS) variable in\by the script, or in the way the
awk utility is [not] handling what gets assigned for that value [as
contrasted with the results seen by the OP on a PC]. Unsure why the
string of '~|~\n|~\r\n' is not a valid\functional awk-regexp for
denoting three possible values for EOL when establishing a value for $0
on either Mac or IBMi, but that seems to be an [if not the] issue.
I have not yet run any tests using inconsistent line-end markings [so
as to give meaning to the specification of more than one possible record
separator value], pending achievement of consistent results with some
simple scripts that mainly perform line-counting with a /simple/ RS
value assigned; e.g. I found that a simple /bare-bones/ counting script
using RS="~\n" against the original data seems to work fine, yet
hard-coding that same assignment as the only change to the original
script still finds awk failing to get past the first record while also
referencing the original inputfile.
As an Amazon Associate we earn from qualifying purchases.