Re: CSVR4 and UTF-8 -- RPG400-L

I've always hated EOL logic. From a standpoint of emulating a typewriter, it should require both. The CR is what positions the print head back to position 1, and the LF is what rolls the paper forward. CR only should overprint. LF only should continue to print at the current print position but on the next line.

I realize that's archaic, but that's what they used to mean. Now, in order to handle all three cases, you have to put in state logic.

Select
When CR, set have CR flag
When LF, reset CR flag and break
Other
If have CR flag, reset CR flag and break
Process printable character
Endsl

That handles CR, LF and CRLF. It doesn't handle LFCR, which would be processed as two line breaks.

Ewwwwww.

On 5/7/2020 8:44 AM, Tools/400 wrote:

Hi Scott,

People may call that "irrelevant" or "hypothetical", but now it is the "testCrUnicode" test case that fails with a "A character representation of a numeric value is in error.".

The reason is that stmfReadLine() does not honor CR as the end-of-line character. It reads the entire file for the first out of three records. As far as I know, CR is used on Mac as the linefeed character.

For example: https://www.oreilly.com/library/view/mac-os-x/0596004605/ch01s06.html

"The Mac, by default, uses a single carriage return (<CR>), represented as \r. Unix, on the other hand, uses a single linefeed (<LF>), \n. Windows goes one step further and uses both, creating a (<CRLF>) combination, \r\n."

The unit test ends successfully for the other two test cases with LF and CRLF linefeed characters.

Regards,

Thomas.

Am 07.05.2020 um 11:36 schrieb Scott Klement:

Hello Thomas and everyone,

I've updated the copy of the CSV utilities on my web site to fix the problems Thomas mentions, below.

Let me know if there's anything else I can do.

-SK

On 5/6/2020 11:58 PM, Tools/400 wrote:

Hi Scott,

For 1) -- That is what I expressed earlier this day (March, 6th):

"In this example there is an unexpected x'00' before x'00A0', so that we would get an extra x'00' byte if we scanned for x'00A0'. Looks like a bug in fgets()."

In addition to that it might be worth to know that there is a PTF for the Unix-type read() function, which fixes a problem for UTF8 files with a BOM. We spotted that problem earlier this year:

https://www.ibm.com/support/pages/ptf/MF67069

For 2) -- You are correct. I did not look at your code carefully enough. When it comes to an "If" I always think about the "else". Hence I automatically cleared "peFldData".

Regards,

Thomas.

This mailing list archive is Copyright 1997-2026 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.