|
Scott i really don't expect to use a lot of effort to configure a PC client to transfer data in the right format to the IFS - besides that UTF-8 needs special recognition - else it just an ASCII file On Fri, May 8, 2015 at 9:15 PM, Scott Klement <midrange-l@xxxxxxxxxxxxxxxx> wrote: > Henrik, > > I believe those CCSIDs are configurable. Also, in some circumstances IBM > i will try to pick the CCSID based on other factors (such as using an ASCII > that supports the same set of characters as the EBCDIC you're using.) > > So I wouldn't say they'd always be 819 or 1252, but I think in most > western countries (those using Latin-1 character set) it'll probably be 819 > and 1252 unless someone has reconfigured them. > > Chuck Pence will probably reply to this and tell you precisely how it > works :-) > > -SK > > > > On 5/8/2015 2:06 PM, Henrik Rützou wrote: > >> Scott >> >> in general we can say that files from FTP in the IFS will become CCSID 819 >> while files dragged and >> droped from your windows will become CCSID 1252 in the IFS - or am I >> wrong? >> >> >> On Fri, May 8, 2015 at 8:59 PM, Scott Klement < >> midrange-l@xxxxxxxxxxxxxxxx> >> wrote: >> >> Jim, >>> >>> In file transfer situations, I would never trust the CCSID file attribute >>> (unless you've already made sure that it's right, of course). >>> >>> Unless you're transferring a save file from another IBM i >>> system/partition, the CCSID is not part of what gets transferred. All >>> that's transferred is the data itself. The system will usually just >>> assign >>> a 'default' CCSID -- it has no way of knowing if it's the right one for >>> your data. It expects you to change it accordingly if your data is >>> different. >>> >>> If you are finding that a single character (such as a "smart quote" or >>> international symbol) is showing up as two bytes of data, resulting in >>> extra 'garbage' when translated to EBCDIC, this almost always means that >>> the data is UTF-8, but you're telling the system that it's ASCII (such as >>> 819) and therefore it will translate the basic alphabet and numbers >>> correctly, but more 'special' characters will be mistranslated. >>> >>> Really, considering that it's 2015, we should all be using Unicode (UTF-8 >>> or UTF-16) for as much as possible. ASCII and EBCDIC are really >>> cumbersome. But, I know it's hard when you have so many applications >>> that >>> are already in EBCDIC -- but an all-unicode environment is really what >>> you >>> should be striving for in the long run, if you can't do it today. >>> >>> Anyway -- how to "purify" the data -- there are certain commonplace >>> issues, such as replacing "smart quotes" with straight quotes that make >>> sense to do. I would definitely do this in Unicode (or ASCII if that's >>> what >>> it is) before translating to EBCDIC. >>> >>> But aside from these common things, it's general ugly and nasty to remove >>> "unwanted" characters. There's no good way to do this, since there's >>> really no way the computer knows which characters are "allowed" and which >>> are not. How does it know whether a half-moon character, for example, is >>> intentional or whether it's an error? Same is true of accented >>> characters >>> -- often times people (at least in the USA) will see these and say they >>> are >>> "garbage" -- but, they are normal parts of human languages in most of the >>> world. How can the computer know that they are "garbage"? Obviously, >>> it's >>> easy for us as human beings to look at the data and realize that a >>> particular character doesn't belong there -- but I'm sure you understand >>> that a computer can't see things that way. >>> >>> So I guess if you want to "purify" your data, the BEST way to do that is >>> to find out where these unwanted characters are coming from, and have it >>> stop sending them. If you really, truly, can't do that then the "hack" >>> would be to make a list of everything you DO want, and remove everything >>> else. What is/isn't a wanted character will almost certainly vary from >>> application to application, so there isn't really any built-in way to do >>> this. Just make a string of all the characters you want, and use RPG >>> operations like %CHECK to find the ones not in that character set and >>> remove them. But, this really is a hack... >>> >>> >>> >>> On 5/8/2015 1:33 PM, Jim Franz wrote: >>> >>> without asking every entity, can one tell looking at the file >>>> attributes? >>>> >>>> Jim >>>> >>>> On Fri, May 8, 2015 at 2:28 PM, Henrik Rützou <hr@xxxxxxxxxxxx> wrote: >>>> >>>> Jim >>>> >>>>> >>>>> even if the files you receive is in CSSID 819/1252 are you sure that >>>>> they >>>>> isn't >>>>> UTF-8 files? >>>>> >>>>> >>>>> On Fri, May 8, 2015 at 8:25 PM, Jim Franz <franz9000@xxxxxxxxx> wrote: >>>>> >>>>> EBCDIC CCSID = 37 >>>>> >>>>>> Most file imports are via ftp - ccsid 1252, occasionally burned dvd >>>>>> for >>>>>> >>>>>> new >>>>> >>>>> customer startup of history. >>>>>> Some trading partners are mainframe, some unix/Linux, some Win, all US >>>>>> based entities, but we think some servers are overseas (we see time >>>>>> differences). >>>>>> >>>>>> When we write ascii text, usually 819 >>>>>> >>>>>> what hurts us most is screen input (web interface to SQL Server then >>>>>> to >>>>>> Power i) where user cuts & pastes paragraphs of text from their source >>>>>> systems (thousands of different customers). >>>>>> Jim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 8, 2015 at 2:07 PM, Henrik Rützou <hr@xxxxxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>> Jim >>>>>> >>>>>>> >>>>>>> what is the EBCDIC CSSid on your machine and how do you recieve >>>>>>> files? >>>>>>> >>>>>>> On Fri, May 8, 2015 at 8:00 PM, Jim Franz <franz9000@xxxxxxxxx> >>>>>>> wrote: >>>>>>> >>>>>>> We do a lot of import and export of data, plus have both PC client >>>>>>> >>>>>>>> >>>>>>>> (local >>>>>>> >>>>>> >>>>>> and web) input as well at PC5250. >>>>>>> >>>>>>>> Had a recent thread involving cut and paste data (ebcdic x'3F') that >>>>>>>> >>>>>>>> caused >>>>>>> >>>>>>> an issue. >>>>>>>> We use CCSID 37 and ascii 819. >>>>>>>> >>>>>>>> There are more EBCDIC characters than what we see on the US >>>>>>>> Keyboard. >>>>>>>> >>>>>>>> Some >>>>>>> >>>>>>> we need, such as copyright symbol, cents sign, etc, but many >>>>>>>> >>>>>>>> We are wanting to take steps to clean the data on input, whether >>>>>>>> from >>>>>>>> >>>>>>>> ascii >>>>>>> >>>>>>> or ebcdic side. We have some input already cleansed, but only at >>>>>>>> >>>>>>>> screen >>>>>>> >>>>>> >>>>> program level. >>>>>> >>>>>>> >>>>>>>> Couple questions: >>>>>>>> 1. Just replacing all below ebcdic x'40' leaves a lot of strange >>>>>>>> characters like x'8C' (sort of a moon with a hat..). One thought is >>>>>>>> >>>>>>>> to >>>>>>> >>>>>> >>>>> identify all the characters we need and replace the rest. No need to >>>>>> >>>>>>> >>>>>>>> keep >>>>>>> >>>>>> >>>>>> line and page formatting stuff. >>>>>>> >>>>>>>> Is this a good idea? >>>>>>>> >>>>>>>> 2. Thinking that since a multitude of entry/update points, db >>>>>>>> >>>>>>>> triggers >>>>>>> >>>>>> >>>>> are >>>>>> >>>>>>> >>>>>>> best? Am wondering about apps that write the data, and now after >>>>>>>> >>>>>>>> write, >>>>>>> >>>>>> >>>>> the >>>>>> >>>>>>> >>>>>>> screen column data is different than column data in file (trigger >>>>>>>> pgm >>>>>>>> cleaned the data - hoping to avoid opening up all the apps. >>>>>>>> >>>>>>>> 3. How far do people with heavy edi take this? Am I leaving some >>>>>>>> >>>>>>>> something >>>>>>> >>>>>>> out with the keyboard characters plus a few more? These are names, >>>>>>>> addresses, notes (which are sometimes pages of notes). >>>>>>>> >>>>>>>> Jim Franz >>>>>>>> -- >>>>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) >>>>>>>> >>>>>>>> mailing >>>>>>> >>>>>> >>>>> list >>>>>> >>>>>>> >>>>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>>>>>> To subscribe, unsubscribe, or change list options, >>>>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>>>>>> Before posting, please take a moment to review the archives >>>>>>>> at http://archive.midrange.com/midrange-l. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Henrik Rützou >>>>>>> >>>>>>> http://powerEXT.com <http://powerext.com/> >>>>>>> -- >>>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) >>>>>>> mailing >>>>>>> >>>>>>> list >>>>>> >>>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>>>>> To subscribe, unsubscribe, or change list options, >>>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>>>>> Before posting, please take a moment to review the archives >>>>>>> at http://archive.midrange.com/midrange-l. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing >>>>>> >>>>>> list >>>>> >>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>>>> To subscribe, unsubscribe, or change list options, >>>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>>>> Before posting, please take a moment to review the archives >>>>>> at http://archive.midrange.com/midrange-l. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Regards, >>>>> Henrik Rützou >>>>> >>>>> http://powerEXT.com <http://powerext.com/> >>>>> -- >>>>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing >>>>> list >>>>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>>>> To subscribe, unsubscribe, or change list options, >>>>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>>>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>>>> Before posting, please take a moment to review the archives >>>>> at http://archive.midrange.com/midrange-l. >>>>> >>>>> >>>>> >>>>> -- >>> This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing >>> list >>> To post a message email: MIDRANGE-L@xxxxxxxxxxxx >>> To subscribe, unsubscribe, or change list options, >>> visit: http://lists.midrange.com/mailman/listinfo/midrange-l >>> or email: MIDRANGE-L-request@xxxxxxxxxxxx >>> Before posting, please take a moment to review the archives >>> at http://archive.midrange.com/midrange-l. >>> >>> >>> >> >> > -- > This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list > To post a message email: MIDRANGE-L@xxxxxxxxxxxx > To subscribe, unsubscribe, or change list options, > visit: http://lists.midrange.com/mailman/listinfo/midrange-l > or email: MIDRANGE-L-request@xxxxxxxxxxxx > Before posting, please take a moment to review the archives > at http://archive.midrange.com/midrange-l. > >
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.