On 06-Jun-2014 10:00 -0500, Henrik Rützou wrote:
You can't concatenate SBCS and DBCS data in one string. It doesn't
make sense since SBCS only has 256 code points in one byte and DBCS
has 64K code points in two bytes and there is no way you can
distinguishes if a character is made of one or two bytes in a
concatenated string.

They can be combined, using _shifted_ DBCS; "mixed data" character strings. Not to imply the utility of that capability for the OP, just that the capability does exist, contrary to the above implication.

UTF-8 is in basic a one to four byte character set that in one byte
encoding shares ASC-II 7 bit character set. UTF-8 has reserved bits
in the first byte that tels how many of the following bytes (0-3)
that creates the "character". <<SNIP>>

Whereas the UTF uses reserved bits to indicate how many bytes for each character, the /shift characters/ of EBCDIC identify when the stream of bytes are DBCS vs SBCS; i.e. whenever there is a shift-out of SBCS [␎: EBCDIC 0x0E] into DBCS, and a shift-in [␏: EBCDIC 0x0F] returning to SBCS out of DBCS.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.