|
Now, is someone adding this to the FAQ page at http://faq.midrange.com? Rob Berendt -- Group Dekko Services, LLC Dept 01.073 PO Box 2000 Dock 108 6928N 400E Kendallville, IN 46755 http://www.dekko.com Bruce Vining <bvining@xxxxxxxxxx> Sent by: midrange-l-bounces@xxxxxxxxxxxx 12/28/2004 03:19 PM Please respond to Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx> To Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx> cc Subject CCSIDs, Code Page, Character Sets... I don't know that I'm any expert either, but I'll give it a shot for a few definitions and examples (which I hope come through): Character set: A finite set of different graphic or control characters that is complete for a given purpose. Each character is represented with a character identifer such as Latin capital letter A being LA020000. One example of a character set would be Character Set 640 - the Syntactic Character Set (Embedded image moved to file: pic01072.gif)An illustration of character set 00640 Another character set would be Character Set 697 - the Country Extended Character Set (Embedded image moved to file: pic13559.gif)Country extended character set 00697 Graphic character: A visual representation of a character that is normally produced by writing, printing, or displaying. For instance the number sign (SM010000) is #. Control character: A character whose occurrence specifies a control function. For instance in EBCDIC DBCS environments the Shift Out and Shift In control characters to begin and end a DBCS sequence. Code point: A unique bit pattern that can serve as an element of a code page to which a character can be assigned. The element is associated with a binary value. The assignment of a character to an element of a code page determines the binary value that will be used to represent each occurrence of the character in a character string. Code points are one or more bytes long. For instance in EBCDIC environments x'C1' is the Latin capital letter A. Code page: A set of assignments, each of which assigns a code point to a character. Each code page has a unique name or identifier. Within a given code page, a code point is assigned to one character. More than one character set can be assigned code points from the same code page. For example in Code Page 37 (EBCDIC USA/Canada) (Embedded image moved to file: pic15607.gif) REQTEXT As an example of more than one character set being assigned to a given code page note that Character Set 640 and Character Set 697 are mapped onto Code Page 37 (640 is a proper subset of 697). CCSID: A number identifying a specific set of identifiers for encoding scheme, character set, code page, and additional coding-related required information (ACRI). For example CCSID 37 represents EBCDIC encoding, Code Page 37, Character Set 697, and no special ACRI. CCSID 5035 represents EBCDIC Mixed byte encoding, Code Page 1027, Character Set 1172, Code Page 300, and Character Set 370. One problem that is often encountered is that for differenct CCSIDs the same graphic character is represented by different code point values. If for example we look at CCSID 500 (EBCDIC, code page 500, character set 697) (Embedded image moved to file: pic10705.gif) REQTEXT And CCSID 273 (EBCDIC, Code Page 273, Character Set 697) (Embedded image moved to file: pic32130.gif) REQTEXT We notice that the commercial at sign @ (SM050000) is x'7C' in CCSID 500 and x'B5' in CCSID 273. If the system is to accurately represent/process SM050000 then it is rather important for the system to know what CCSID a given piece of character data is in. The ideal is that if we were to convert a string containing SM050000 to CCSID 277 (Embedded image moved to file: pic19113.gif) REQTEXT from either CCSID 500 or CCSID 273 that we would end up with x'80' as that is the correct code point value for SM050000 (@) in CCSID 277. The iSeries will correctly perform these conversions so long as the data is correctly identified by the CCSID currently in use to represent the data. All too often though when a problem is encounted the data being processed is not in the CCSID you might think it is. This can be due to variables such as the terminal you are entering data from (they have their own Character Set and Code Page values), the CCSID the job is running in, and the CCSID conversion options in effect for your job. For this reason it is often best/easiest to display the hex values of your data rather than trusting the graphic character you see on your terminal as the terminal will simply show you a graphic character based on the assumption that the data is in the terminal configured code page. qsrvbas@netscape. net (Tom Liotta) Sent by: To midrange-l-bounce midrange-l@xxxxxxxxxxxx s@xxxxxxxxxxxx cc Subject 12/23/2004 07:26 RE: Another QtmmSendMail problem? PM Please respond to Midrange Systems Technical Discussion >> Due to all the possible knobs it may be easiest to simply >> use debug and >> look at the hex value of the address on their system >> prior to the >> conversion to confirm their @ is indeed x'7C' This piece of Bruce's advice seems to be the one that gets missed the most. Not necessarily for this problem; I wouldn't expect Brad to skip over it lightly. But confusions often seem to arise simply because we look at the _characters_ that get displayed or printed and we forget (or never realize) that the _important_ aspect is the actual bit pattern. The shape of the displayed or printed character can change simply because we use a different terminal or printer or emulator. The same data can look like a '@' on one terminal and like a '§' on another just because the devices interpret the hex values into different character sets. Many of us learned years ago to use *CAT or *BCAT instead of the special characters in CL programs for this same reason. The 'characters' would change depending on the device settings regardless of the hex values of the data. I keep hoping we can work out a decent FAQ coverage of what this all means, but nobody seems comfortable enough with all the concepts to get it right. I know how I interpret things, but I'm totally unclear whether I'm very close to understanding. So... I'm going to ramble on about how I understand it all just to get something going for the archives. Maybe some real experts will respond with corrections, enhancements, clarifications or whatever; and a real FAQ will start to emerge. Character set -- The name for the things we see. To me, a 'character set' represents a list of shapes of characters. Each 'character' in the set is what a device draws when it sees a particular bit pattern from a given code page. Code page -- The name for collating sequence of characters. Different languages have different characters. German uses a lot of umlauts for example. You can have the letter 'o' and an umlaut-'o' in German. If you want to sort words that have those letters, how do you choose which has the highest collating sequence? When converting to a different code page, what happens to the collating sequence? I've kind of pictured the problem of reconciling collating sequences between different languages as partly being a computer design problem. Somewhere in the hardware, there's a way to compare one character against another and get a result that says "greater", "lesser" or "equal". But the frequency of characters in words in a language also seems to need to be addressed. This seems partly addressed by converting code pages from computer to computer. Because character comparisons happen so often, it should be addressed at a low level. I'd imagine that code pages are designed to help minimize what it takes to do comparisons of words and I would expect that conversions between two code pages would take that into account. If I convert stuff from Cyrillic to English, maybe I shouldn't expect exact bit-for-bit matches. Maybe the difference would show up simply because computers that normally operate on English data can be more efficient with a different code page than computers that regularly operate on Cyrillic data. So, the combination of code pages and character sets account for the differences of collating sequences and visual representations. During code page conversions, efficiencies are maybe maintained to some degree. CCSID -- The name for a particular combination of code page with a character set. For a given code page, a change in character set means it needs to be a different CCSID. Rather than keeping track of code pages and character sets separately, we just use one symbol that stands for the two of them together, the Coded Character Set IDentifier, CCSID. Those are basically the vague, nebulous meanings I've come to use for those three concepts. I'd love it if someone could say they're right, or correct them where they're wrong. (Ideally, someone should also clear up how Unicode fits in with CCSID since it almost seems as if Unicode sometimes encompasses all of "CCSID" within almost a single CCSID. Or something like that.) Tom Liotta midrange-l-request@xxxxxxxxxxxx wrote: > 7. Re: Another QtmmSendMail problem? (Brad Stone) > >On Tue, 21 Dec 2004 12:49:56 -0600 > >Things are still as they were... the funny thing is, >SNDDST seems to send emails just fine when they use the @ >sign. > >When they use § and the email api, it works fine, shows up >as @ in MIME file header, but § in a PF record that is used >to log emails that are sent out. > >So to me, something is surely converting wrong in this >particular case. > >They did some searching and found that the § and @ >characters are reversed in the different CCSIDs, though. > >I'm not sure where else to look. > > Bruce Vining <bvining@xxxxxxxxxx> wrote: >> >> A job CCSID of 870 and the resulting conversions to 500 >> would be the same >> on both systems. What might be different though are the >> code points being >> generated by your keyboards (assuming the address is >> being provided >> interactively) and subsequently being processed in the >> job so that the >> actual inputs to the conversion are not the same. >> >> Due to all the possible knobs it may be easiest to simply >> use debug and >> look at the hex value of the address on their system >> prior to the >> conversion to confirm their @ is indeed x'7C' (and that >> yours in test is >> also). If debug isn't possible then we would need to >> know the >> configuration (CHRID and KBDTYPE) of their workstation >> (and hope that this >> does represent what the workstation really is sending, >> which with all the >> emulators available is always a big IF these days...); >> the CCSID, DFTCCSID, >> and CHRIDCTL of their job; and the CHRID value for their >> *DSPF (or >> *PNLGRP). -- Tom Liotta The PowerTech Group, Inc. 19426 68th Avenue South Kent, WA 98032 Phone 253-872-7788 x313 Fax 253-872-7904 http://www.powertech.com __________________________________________________________________ Switch to Netscape Internet Service. As low as $9.95 a month -- Sign up today at http://isp.netscape.com/register Netscape. Just the Net You Need. New! Netscape Toolbar for Internet Explorer Search from anywhere on the Web and block those annoying pop-ups. Download now at http://channels.netscape.com/ns/search/install.jsp -- This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options, visit: http://lists.midrange.com/mailman/listinfo/midrange-l or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l. -- This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options, visit: http://lists.midrange.com/mailman/listinfo/midrange-l or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.