Re: CSVR4 and UTF-8 -- RPG400-L

Thomas - take a look at U+2018 and U+2019 and U+201C and U+201D. U+2019 in that page - a great site - is RIGHT SINGLE QUOTATION MARK.

U+2018 ‘ e2 80 98 LEFT SINGLE QUOTATION MARK
U+2019 ’ e2 80 99 RIGHT SINGLE QUOTATION MARK
U+201A ‚ e2 80 9a SINGLE LOW-9 QUOTATION MARK
U+201B ‛ e2 80 9b SINGLE HIGH-REVERSED-9 QUOTATION MARK
U+201C “ e2 80 9c LEFT DOUBLE QUOTATION MARK
U+201D ” e2 80 9d RIGHT DOUBLE QUOTATION MARK

These are among several characters that don't exist in EBCDIC - an ellipsis or em dash. We had run into this challenge with text entered on an iPhone app. And these can come form the Autocorrect options in MS Word or Outlook.

I ended up using SQL on i to import the text, and it puts X'3F' where it encounters characters it can't convert.

Greg asks about recommendations - maybe the only 1 to retain the characters would be to use 1208 (UTF-8) as the CCSID - of course, that can't be done without a whole lot of work.

And IBM do not provide conversion tables between UTF-8 and EBCDIC - how could you, at least in their present form.

Does iconv have options to convert these typographer (another descriptor of these things) characters into something like EBCDIC?

Cheers
Vern

On 5/5/2020 1:02 AM, Thomas Raddatz wrote:

I do not know what you mean with ' right single quotation mark '. I assume it is a ACUTE ACCENT or a GRAVE ACCENT according to UTF8 table https://www.utf8-chartable.de/.

I did a brief test with service program CSVR4 and the following test data on our IBM i:

"ABC123","Scott Klement","123 Sesame St","Milwaukee, WI","USA","","53132-1234",1000.00
"ABC123","Bärbel Böhm","Some Street","Some City","Germany","","40721",1000.00
"ABC123","`Jürgen` ´Bärbeißer´","Some Street","Some City","Germany","","40721",1000.00

The report produced by CSVDEMO shows the result expected:

File . . . . . : QSYSPRT
Control . . . . .
Find . . . . . .
*...+....1....+....2....+....3...
Acct Name
---------- ---------------------
ABC123 Scott Klement
ABC123 Bärbel Böhm
ABC123 `Jürgen` ´Bärbeißer´

The German Umlaute as well as the ACUTE ACCENT and GRAVE ACCENT are correctly printed. Hence I assume that CSVR4 works fine.

We do not use CSVR4, so a brief test is all I can do.

Did you check the CCSID of your inpput? Is it 1208 (= UTF8)?

Thomas.

-----Ursprüngliche Nachricht-----
Von: RPG400-L [mailto:rpg400-l-bounces@xxxxxxxxxxxxxxxxxx] Im Auftrag von Greg Wilburn
Gesendet: Montag, 4. Mai 2020 16:24
An: RPG400-L@xxxxxxxxxxxxxxxxxx
Betreff: CSVR4 and UTF-8

I have a program that is using the CSVR4 service programs to read tab delimited text files that we pull down from a website. The site is using UTF-8 character set... occasionally, we have issues with character translation.

Example: x'e2 80 99' (right single quotation mark) makes a real mess of the customer's name.

I have a utility that removes non-display characters, but in this case I need to keep the character.

Any recommendations on changes that could be made to the process that would eliminate some of these translation issues?

Thanks,
Greg