CRPence

It is important to understand that any fields type SBCS/DBCS is just a
string of hexadecimal bytes
if you allocate a buffer and place the values into the buffer. Allocated
storage has no CCSID it is
just data. In order to use ICONV on such storage you need to have all the
data in the same
CCSID.

The mixed field types fields such as CCSID 1388/5035 can to my best
knowledge - like UTF-8 fields
- only be defined as DDS/DLL fields and not be defined as internal RPG
D-spec fields. In RPG such fields
are treated as SBCS alphanumeric fields (type A) and if you want to use the
data in UNICODE
DBCS fields you need to use ICONV to convert the string to DBCS and vice
versa by tricking
ICONV to see the alphanumeric fields as a CCSID 1388/5035 data string.

It may be possible to use %ucs bifs and ICONV '0' but that will require
that the job or the program runs
in either CCSID 1388 or 5035. I haven't tried it!

If you run your system in a typical western SBCS CCSID such as 37 Apache
dosn't support
automatic conversion from EBCDIC CCSID 1388/5035 to 819 (ISO 8859-1 ASCII)
or 1208 (UTF-8) you
have to use ICONV and a MIME header with a charset that correspond.

The discussion is however theoretical since it requires a lot of work
around to use excotic CCSID's
and the whole world is moving towards UNICODE/UTF-8 in any textbased format
(HTML/XML/JSON/MAIL)
and most program languages (besides javascript that uses UTF-8) either
support SBCS ASCII or
EBCDIC or DBCS UTF-16/UNICODE (CCSID 1200).

In other words don't expect that any receiver will be able to read data in
the body send as

Content-Type: Message; charset="JIS_C6229-194-hand-add"

even though the "standards" say so.





On Sat, Jun 7, 2014 at 1:11 AM, CRPence <CRPbottle@xxxxxxxxx> wrote:

On 06-Jun-2014 10:00 -0500, Henrik Rützou wrote:

You can't concatenate SBCS and DBCS data in one string. It doesn't

make sense since SBCS only has 256 code points in one byte and DBCS
has 64K code points in two bytes and there is no way you can
distinguishes if a character is made of one or two bytes in a
concatenated string.


They can be combined, using _shifted_ DBCS; "mixed data" character
strings. Not to imply the utility of that capability for the OP, just that
the capability does exist, contrary to the above implication.

UTF-8 is in basic a one to four byte character set that in one byte
encoding shares ASC-II 7 bit character set. UTF-8 has reserved bits
in the first byte that tels how many of the following bytes (0-3)
that creates the "character". <<SNIP>>


Whereas the UTF uses reserved bits to indicate how many bytes for each
character, the /shift characters/ of EBCDIC identify when the stream of
bytes are DBCS vs SBCS; i.e. whenever there is a shift-out of SBCS [␎:
EBCDIC 0x0E] into DBCS, and a shift-in [␏: EBCDIC 0x0F] returning to SBCS
out of DBCS.

--
Regards, Chuck

--
This is the RPG programming on the IBM i (AS/400 and iSeries) (RPG400-L)
mailing list
To post a message email: RPG400-L@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/rpg400-l.





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.