Re: Awk script running in QShell -- MIDRANGE-L

On 20-Jul-2016 14:58 -0500, Buck Calabro wrote:

On 7/20/2016 1:40 PM, Fuchs, James M wrote:

Below is the contents of the file I am trying to process.
The Field separator is an asterisk....
<<SNIP>>

<<SNIPped sample input data>>

Thank you for this <ed: snipped sample-data>. I did some testing in
PASE (call qp2term). I don't know what vintage my AWK is. I had to do
2 things to this to get it to run, and I don't really understand
why.

1) In the AWK script, change the line endings to LF instead of CRLF

IBM i->IBM i 7.1->Programming->IBM PASE for i->Preparing programs to run in IBM PASE for i->Copying the IBM PASE for i program to your system->Line-terminating characters in integrated file system files
[http://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_71/rzalf/rzalflineterm.htm]

2) In the incoming file, remove the line endings entirely

call qp2term
$
awk -f awktest.awk awktest.txt awkout.txt
$
cat awkout.txt
<<SNIPped output received>>

Given those files as UTF8 data, both the above <snipped> data and the awk script from [http://pastebin.com/qepNCBsG], when run without any changes on my Mac, no output is produced. And that is the same as on the IBM i; AIUI, as also confirmed by both the OP and Buck, in their tests on IBM i.

For the OP, and perhaps for anyone also who looked or planned to look at the issue, I offer, as FWiW:

Given my files were created on the Mac and sent via FTP from a Mac to the IBM i, the [default] effect was having the LineFeed (LF) as record delimiter, in both the input file and the script file that I used.

I verified that when running from QSH, with both files having a CCSID tagging of 1252 and with data to match [best I can tell], that after the following revisions were made to those files [different changes than Buck's, though still requires stripping some record-ending characters], I get the same above noted results <snipped>, as reported by Buck:
• script modified as noted above in (1) for just LF vs CRLF; for me, as noted, that was implicit, as I did not have to take any overt action.
• script revised so both assignments of RS are only\always: RS = "\n"; # this nullifies the [lack of] specification of the fourth arg
• inputfile file modified, but unlike noted above in (2), solely to have the LF character as end-of-record, just as with the script file
• inputfile revised further, to remove all of the tilde characters at the end of every line\record

I also verified the same positive results after those same changes were made, then copied back to my Mac, then run in bash on my Mac; that was true, whether re-saving both files as UTF-8 or as apparent CCSID-1252. So apparently both Mac and IBMi are similarly /broken/ as contrasted with the reported effects by the OP for use of that script+data on Windows; in a potentially positive light, at least there seems some consistency between the IBMi and the Mac ;-)

Thus I infer the primary problem is with the value being assigned to the Record Separator (RS) variable in\by the script, or in the way the awk utility is [not] handling what gets assigned for that value [as contrasted with the results seen by the OP on a PC]. Unsure why the string of '~|~\n|~\r\n' is not a valid\functional awk-regexp for denoting three possible values for EOL when establishing a value for $0 on either Mac or IBMi, but that seems to be an [if not the] issue.

I have not yet run any tests using inconsistent line-end markings [so as to give meaning to the specification of more than one possible record separator value], pending achievement of consistent results with some simple scripts that mainly perform line-counting with a /simple/ RS value assigned; e.g. I found that a simple /bare-bones/ counting script using RS="~\n" against the original data seems to work fine, yet hard-coding that same assignment as the only change to the original script still finds awk failing to get past the first record while also referencing the original inputfile.