|
Scott in general we can say that files from FTP in the IFS will become CCSID 819 while files dragged and droped from your windows will become CCSID 1252 in the IFS - or am I wrong? On Fri, May 8, 2015 at 8:59 PM, Scott Klement <midrange-l@xxxxxxxxxxxxxxxx> wrote: > Jim, > > In file transfer situations, I would never trust the CCSID file attribute > (unless you've already made sure that it's right, of course). > > Unless you're transferring a save file from another IBM i > system/partition, the CCSID is not part of what gets transferred. All > that's transferred is the data itself. The system will usually just assign > a 'default' CCSID -- it has no way of knowing if it's the right one for > your data. It expects you to change it accordingly if your data is > different. > > If you are finding that a single character (such as a "smart quote" or > international symbol) is showing up as two bytes of data, resulting in > extra 'garbage' when translated to EBCDIC, this almost always means that > the data is UTF-8, but you're telling the system that it's ASCII (such as > 819) and therefore it will translate the basic alphabet and numbers > correctly, but more 'special' characters will be mistranslated. > > Really, considering that it's 2015, we should all be using Unicode (UTF-8 > or UTF-16) for as much as possible. ASCII and EBCDIC are really > cumbersome. But, I know it's hard when you have so many applications that > are already in EBCDIC -- but an all-unicode environment is really what you > should be striving for in the long run, if you can't do it today. > > Anyway -- how to "purify" the data -- there are certain commonplace > issues, such as replacing "smart quotes" with straight quotes that make > sense to do. I would definitely do this in Unicode (or ASCII if that's what > it is) before translating to EBCDIC. > > But aside from these common things, it's general ugly and nasty to remove > "unwanted" characters. There's no good way to do this, since there's > really no way the computer knows which characters are "allowed" and which > are not. How does it know whether a half-moon character, for example, is > intentional or whether it's an error? Same is true of accented characters > -- often times people (at least in the USA) will see these and say they are > "garbage" -- but, they are normal parts of human languages in most of the > world. How can the computer know that they are "garbage"? Obviously, it's > easy for us as human beings to look at the data and realize that a > particular character doesn't belong there -- but I'm sure you understand > that a computer can't see things that way. > > So I guess if you want to "purify" your data, the BEST way to do that is > to find out where these unwanted characters are coming from, and have it > stop sending them. If you really, truly, can't do that then the "hack" > would be to make a list of everything you DO want, and remove everything > else. What is/isn't a wanted character will almost certainly vary from > application to application, so there isn't really any built-in way to do > this. Just make a string of all the characters you want, and use RPG > operations like %CHECK to find the ones not in that character set and > remove them. But, this really is a hack... > > > > On 5/8/2015 1:33 PM, Jim Franz wrote: > >> without asking every entity, can one tell looking at the file attributes? >> >> Jim >> >> On Fri, May 8, 2015 at 2:28 PM, Henrik Rützou <hr@xxxxxxxxxxxx> wrote: >> >> Jim >>> >>> even if the files you receive is in CSSID 819/1252 are you sure that they >>> isn't >>> UTF-8 files? >>> >>> >>> On Fri, May 8, 2015 at 8:25 PM, Jim Franz <franz9000@xxxxxxxxx> wrote: >>> >>> EBCDIC CCSID = 37 >>>> Most file imports are via ftp - ccsid 1252, occasionally burned dvd for >>>> >>> new >>> >>>> customer startup of history. >>>> Some trading partners are mainframe, some unix/Linux, some Win, all US >>>> based entities, but we think some servers are overseas (we see time >>>> differences). >>>> >>>> When we write ascii text, usually 819 >>>> >>>> what hurts us most is screen input (web interface to SQL Server then to >>>> Power i) where user cuts & pastes paragraphs of text from their source >>>> systems (thousands of different customers). >>>> Jim >>>> >>>> >>>> >>>> >>>> On Fri, May 8, 2015 at 2:07 PM, Henrik Rützou <hr@xxxxxxxxxxxx> wrote: >>>> >>>> Jim >>>>> >>>>> what is the EBCDIC CSSid on your machine and how do you recieve files? >>>>> >>>>> On Fri, May 8, 2015 at 8:00 PM, Jim Franz <franz9000@xxxxxxxxx> wrote: >>>>> >>>>> We do a lot of import and export of data, plus have both PC client >>>>>> >>>>> (local >>>> >>>>> and web) input as well at PC5250. >>>>>> Had a recent thread involving cut and paste data (ebcdic x'3F') that >>>>>> >>>>> caused >>>>> >>>>>> an issue. >>>>>> We use CCSID 37 and ascii 819. >>>>>> >>>>>> There are more EBCDIC characters than what we see on the US Keyboard. >>>>>> >>>>> Some >>>>> >>>>>> we need, such as copyright symbol, cents sign, etc, but many >>>>>> >>>>>> We are wanting to take steps to clean the data on input, whether from >>>>>> >>>>> ascii >>>>> >>>>>> or ebcdic side. We have some input already cleansed, but only at >>>>>> >>>>> screen >>> >>>> program level. >>>>>> >>>>>> Couple questions: >>>>>> 1. Just replacing all below ebcdic x'40' leaves a lot of strange >>>>>> characters like x'8C' (sort of a moon with a hat..). One thought is >>>>>> >>>>> to >>> >>>> identify all the characters we need and replace the rest. No need to >>>>>> >>>>> keep >>>> >>>>> line and page formatting stuff. >>>>>> Is this a good idea? >>>>>> >>>>>> 2. Thinking that since a multitude of entry/update points, db >>>>>> >>>>> triggers >>> >>>> are >>>>> >>>>>> best? Am wondering about apps that write the data, and now after >>>>>> >>>>> write, >>> >>>> the >>>>> >>>>>> screen column data is different than column data in file (trigger pgm >>>>>> cleaned the data - hoping to avoid opening up all the apps. >>>>>> >>>>>> 3. How far do people with heavy edi take this? Am I leaving some >>>>>> >>>>> something >>>>> >>>>>> out with the keyboard characters plus a few more? These are names, >>>>>> addresses, notes (which are sometimes pages of notes). >>>>>> >>>>>> Jim Franz >>>>>> -- >>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) >>>>>> >>>>> mailing >>> >>>> list >>>>> >>>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>>>> To subscribe, unsubscribe, or change list options, >>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>>>> Before posting, please take a moment to review the archives >>>>>> at http://archive.midrange.com/midrange-l. >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Henrik Rützou >>>>> >>>>> http://powerEXT.com <http://powerext.com/> >>>>> -- >>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing >>>>> >>>> list >>>> >>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>>> To subscribe, unsubscribe, or change list options, >>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>>> Before posting, please take a moment to review the archives >>>>> at http://archive.midrange.com/midrange-l. >>>>> >>>>> >>>>> -- >>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing >>>> >>> list >>> >>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>> To subscribe, unsubscribe, or change list options, >>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>> Before posting, please take a moment to review the archives >>>> at http://archive.midrange.com/midrange-l. >>>> >>>> >>>> >>> >>> -- >>> Regards, >>> Henrik Rützou >>> >>> http://powerEXT.com <http://powerext.com/> >>> -- >>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing >>> list >>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>> To subscribe, unsubscribe, or change list options, >>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>> Before posting, please take a moment to review the archives >>> at http://archive.midrange.com/midrange-l. >>> >>> >>> > -- > This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list > To post a message email: MIDRANGE-L@xxxxxxxxxxxx > To subscribe, unsubscribe, or change list options, > visit: http://lists.midrange.com/mailman/listinfo/midrange-l > or email: MIDRANGE-L-request@xxxxxxxxxxxx > Before posting, please take a moment to review the archives > at http://archive.midrange.com/midrange-l. > >
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.