This all comes down to the about 1 mio characters in UNICODE that can't be
mapped to any
256 char SBCS CCSid and it really dosn't matter if it is a emoji or a MS
Office special char.

UNICODE comes basically in tre flawors

UTF-8 that is a 1-4 8 bit encoding
UTF-16 that is a 1-2 16 bit encoding
UTF-32 that is a 1 32 bit encoding

In UTF-8 and UTF-16 the first 4 bits tells you how may bits the character
occupies.

UTF-8

x'00' - x'7F' is one 8 bit encoding equal to ASCII 7 bit encoding
x'C0XX' is a 16 bit encoding
x'E0XXXX' is a 24 bit encoding
x'F0XXXXXX' is a 32 bit encoding

ICONV will always be able to convert x'00' - x'7F' back and forth to
EBCDIC. All other
characters depends on the EBCDIC CCSid however ICONV will always be able to
convert EBCDIC > UTF-8 since UTF-8 covers all possible EBCDIC characters
regardless
of CCSid.

ICONV from UTF-8 > EBCDIC will leave any unsupported character in the to
CCSid as
a single blank or will stop converting the string if the character is
unknown.

To read "unknown" UTF-8 is tricky since you need to clean it up and decide
what to do
with unsupported characters. You may want to replace the EURO sign with
'EUR' etc.
or you may want to replace a emoji or a greek or chinese character as blank.

There are trick's to do this but you really need to have the experience
with special
characters to know what to do and the best way is to.



On Thu, Jun 8, 2017 at 11:49 PM, Douglas Dunn <dunndouglas0@xxxxxxxxx>
wrote:

This situation reminds me of old MIME-type errors on emails - especially
with the mention of those "box characters". I saw them occasionally before
about 2005, but now they are so well and subtly handled that nobody really
sees them, or thinks about it. Emoji is not quite there yet, but maybe in
10 years it will be.

I agree that any of the highly specific ones can be unmodified, especially
since the Unicode set includes a lot more than I had realized, and of
course it is a well accepted standard and future-proof.

Thanks to everyone for being patient with me. I know that most people here
have 30+ years experience, and I really appreciate the help with my
"newbie" questions. It's not exactly easy to find resources on IBM i!

Doug Dunn

On Thu, Jun 8, 2017 at 2:29 PM, Brian May <bmay@xxxxxxxxxxxxxxxxx> wrote:

This is why we have this group, for discussions.

The standard emoji in the Unicode character set should suffice. There
are
oddball emoji that are Apple and Android specific. Most users avoid
those
as they don't work for everyone. However, the application should handle
those outliers gracefully.

If my 13 yr old daughter sends me an emoji, as she does constantly, and
my
Android doesn't know what to do with it, I usually just get a box like
character. So maybe the application should just accept the data as is,
once Unicode is implemented, and leave it be.

Brian May
Director
Pre-Sales and Customer Solutions
Profound Logic Software
http://www.profoundlogic.com
937-439-7925 Phone
877-224-7768 Toll Free


The IBM i Modernization Experts
www.profoundlogic.com



Node.js for Enterprise IBM i Modernization
5 Reasons why Node is the solution you need
Read the White Paper Now


-----Original Message-----
From: RPG400-L [mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of
Douglas Dunn
Sent: Thursday, June 8, 2017 4:02 PM
To: RPG programming on the IBM i (AS/400 and iSeries) <
rpg400-l@xxxxxxxxxxxx>
Subject: Re: Question about processing interesting "UTF-8" characters
from
XML

Brian,

Certainly, I agree that the outdated image should be avoided at all costs
- and as much as I dislike them, emoji are part of that.

I think we were discussing different emoji sets, to a degree. I just
looked into the Wikipedia article on the subject, and it seems it is
indeed
a lot more complex than I realized. In addition to about 1000 Unicode
characters, Apple, Google and others seem to have competing standards.
I've
seen this Apple system once, and it seems crazy to me, when I type
"key", I
do not want a picture of a key to come up (and I wouldn't be surprised if
you can't turn it off...) This is the sort of thing that prevents me from
buying Apple products.

I'd also point out that as a person in my 20's, I don't really know
anyone
who uses emojis (except sarcastically) outside of the most common 10,
such
as =D, =P, etc. Supporting the commonly used ones is certainly a good
idea
- but I think most of the extended sets are very dubious, it's mostly a
fad with the age range 16-20.

Glad I could contribute a bit to the discussion.

Respectfully,
Doug

On Thu, Jun 8, 2017 at 10:42 AM, Jon Paris <jon.paris@xxxxxxxxxxxxxx>
wrote:

Just a thought for Vern - If you look to Skype for example - all
(most? ) emojis have a text equivalent. For example (heidy) is one I
often use in texts to a certain lady of mine. Rather perhaps a lookup
table of textual equivalents would be more useful in the long term
than simply storing UTF8 values that can never be viewed on a green
screen etc. At least that way anyone reading the text on a non-emoji
capable device would be able to understand the sentiment. (poop)


Jon Paris

www.partner400.com
www.SystemiDeveloper.com

On Jun 8, 2017, at 1:16 PM, Brian May <bmay@xxxxxxxxxxxxxxxxx>
wrote:

Good to hear. That is the correct fix to the issue. In the
meantime, a
replacement character solution could be implemented. If you replace
the input coming in, you should also put it back on output if at all
possible.
Just a suggestion.

And to Doug's remark, in this day and age, our applications should
be
designed to accept any input our users want, regardless of our opinion
on the usefulness of the type of input. In the world of mobile
devices, this capability is just expected. If we limit it, then it
just perpetuates the outdated image of the platform.

Brian May
Director
Pre-Sales and Customer Solutions
Profound Logic Software
http://www.profoundlogic.com
937-439-7925 Phone
877-224-7768 Toll Free


The IBM i Modernization Experts
www.profoundlogic.com



Node.js for Enterprise IBM i Modernization
5 Reasons why Node is the solution you need Read the White Paper Now


-----Original Message-----
From: RPG400-L [mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of
Vernon Hamberg
Sent: Thursday, June 8, 2017 12:13 PM
To: rpg400-l@xxxxxxxxxxxx
Subject: Re: Question about processing interesting "UTF-8" characters
from XML

To be sure - and there are plans for an upgrade - but not in time to
take care of the immediate issues.

On 6/8/2017 12:09 PM, Brian May wrote:
Vern,

Could this be justification for an OS upgrade so you can use UTF-8?

Brian May
Director
Pre-Sales and Customer Solutions
Profound Logic Software
http://www.profoundlogic.com
937-439-7925 Phone
877-224-7768 Toll Free


The IBM i Modernization Experts
www.profoundlogic.com



Node.js for Enterprise IBM i Modernization
5 Reasons why Node is the solution you need Read the White Paper Now


-----Original Message-----
From: RPG400-L [mailto:rpg400-l-bounces@xxxxxxxxxxxx] On Behalf Of
Vernon Hamberg
Sent: Thursday, June 8, 2017 12:05 PM
To: rpg400-l@xxxxxxxxxxxx
Subject: Re: Question about processing interesting "UTF-8"
characters
from XML

Hi Doug

Good thoughts overall - and I agree as to how useful this is - but
we
are stuck with it now - further comment would be unwise.

As to replacing with a text equivalent - you have to understand how
many of these things there are. So far someone has managed to get the
"skull"
and the "key" - on an iPhone, if you type "key" you get an option
above
the keyboard to press and get the emoji of a key - sheesh!

It's just not practical - my plan is to record the position where
the
offending byies are - XML-SAX tells us that - then check the bytes
against
a table - HEY WAIT! I __could__ put an alternate text in there,
couldn't
I?

I'll have to get some kind of sanction, but this could work!

Thanks
Vern

On 6/8/2017 11:48 AM, Douglas Dunn wrote:
What an absurd problem! Before I say anything, I know you already
said
that "currently you have no limitations on the entry of emoji", so
I
assume that is a matter of company policy. The curiosity is
consuming
me though - emoji and EBCDIC? I'm not sure those words have ever
been
in the same sentence before.

Of course, I do not know what your application is, but I really
can't
see any case where emoji, em dash, or ellipsis are considered
"useful"
input. I personally would try to get that requirement changed, and
delete characters that are not on codepage 37. It just seems like a
lot of work, for what reason exactly?

If that is not practical, another option might be to translate
emoji
to their "text" equivalent: the "smiley" becomes "=D", etc. All of
those should be EBCDIC characters from there, but of course 1
character longer.
Em dash becomes regular dash, ellipse becomes "...".

Just my thoughts!

On Thu, Jun 8, 2017 at 6:57 AM, Henrik Rützou <hr@xxxxxxxxxxxx>
wrote:

Hi Vernon,

what is it that you receive in the XML-file originally, is it
UTF-8?


On Thu, Jun 8, 2017 at 2:15 PM, Vernon Hamberg
<vhamberg@xxxxxxxxxxxxxxx>
wrote:

Yáll

We get XML files from our field associates who use iPhones to
enter
service information. That data is sent up to the IBM i in XML
files.

We are using XML-SAX to process these files. But the process
stops
when
it
can't parse the XML, and at this time someone goes into the XML
and
cleans
up the problem.

I am to find a way to eliminate as many parsing issues as
possible.
Here's
what I've done so far, with help from Barbara Morris.

The things that are failing include emojis (some values are free
text entry and can contain anything available on an iPhone
keyboard
- and
there
are currently no limitations on this - don't ask!!)

Another thing that fails are things like an ellipsis or an em
dash
-
these
do not exist in EBCDIC 37.

The former use of XML-SAX did not include the ccsid option, so it
tried
to
bring XML values into a CCSID 37 variable - and that can't be
done,
hence the 351 parsing error status code.

I've changed this so that the option parameter is "doc=file
ccsid=ucs2" - we are at 7.1, so 1208 (UTF-8) is not an option.

So the values are all returned to RPG in UCS-2, and this is
working
OK
for
things like the ellipsis and em dash - more on this in a moment.
The
emojis
still don't parse, because they are 4-byte entities in UTF-8 and
don't exist in UCS-2. I have a plan to take care of those, based
on
the offset into the XML file that XML-SAX tells us in the event
of
an exception.

Back to the horizontal ellipsis - in UTF-8 this is a 3-byte
sequence, in hex, X'E280A6' - I see that in the XML file in the
IFS
that is tagged as CCSID 1208 - that's required.

In the UCS-2 value, this is a 2-byte sequence, x2026.

In the program I assign the UCS-2 value to a column in a PF that
is
CCSID
37 - it appears there as X'0E447f0F' when I use DSPPFM on the
PF..

At first I was not sure what this was - did a google on it and
was
led to a site with IBM937 info - finally I understood this is a
DBCS
character with shift-in and shift-out characters.

When I write this to the PF and use DSPPFM, it looks like " àÉ ".

I can accept that in the short term but wonder if we can do
better
-
I believe that the ellipsis prints as a blank at the customer, so
that
would
be a suitable option, to get, perhaps, the X'3F' unprintable byte
in
that place.

Or is there a way that I can take advantage of the DBCS-OPEN (is
that the
term?) and actually print the ellipsis? I've never dealt with
DBCS.

Or should I use iconv instead of just the default RPG character
conversion
when assigning the UCS-2 value to a CCSID-37 variable?

Thanks much
Vern
--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L) mailing list To post a message email:
RPG400-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list
options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx Before posting, please
take
a moment to review the archives at
http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our
affiliate
link: http://amzn.to/2dEadiD


--
Regards,
Henrik Rützou

http://powerEXT.com <http://powerext.com/>
--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L) mailing list To post a message email:
RPG400-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list
options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx Before posting, please
take a
moment to review the archives at
http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our
affiliate
link: http://amzn.to/2dEadiD

--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L) mailing list To post a message email: RPG400-L@xxxxxxxxxxxx
To
subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives at
http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our
affiliate
link: http://amzn.to/2dEadiD

--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our
affiliate
link: http://amzn.to/2dEadiD
--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our
affiliate
link: http://amzn.to/2dEadiD

--
This is the RPG programming on the IBM i (AS/400 and iSeries)
(RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: http://amzn.to/2dEadiD

--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: http://amzn.to/2dEadiD
--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: http://amzn.to/2dEadiD

--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: http://amzn.to/2dEadiD





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.