|
Hi David, As you might expect from the response so far, it is nearly impossible, in the general case, to determine text/binary format with 100% reliability. Most attempts check for magic numbers ( like 0xCAFEBABE for Java classes ) and values greater than 127 to detect non-ASCII values. The problem is that with Unicode variations and other encodings, along with user ( programmer, that is ) generated binary files with no formal headers, the previous techniques have either a lot of misses or invalid matches. The AS/400 has our old friend CCSID 65535 to denote binary files, unlike other systems, but it's not used consistently, and has ended up being a barrier rather than an aid. This link: http://www.gsp.com/cgi-bin/man.cgi?section=1&topic=file gives an idea of what's involved for the "file" command on Unix type systems. There's a Windows, program called TrID - File Identifier that uses an XML-based database and allows you to add your own entries that may help. It's at: http://mark0.net/soft-trid-e.html Also, Mozilla has a module that attempts to determine binary/text. I'd guess it's probably used in Firefox as well. The source is available for these products in C/C++. You could possibly use or convert that. This page: http://gemal.dk/blog/2003/12/18/autodetect_correct_mime_type_from_textplain_content/ has links to a full discussion. It uses the term "Firebird", which I think mostly evolved to Firefox. These can be downloaded at: http://www.mozilla.org/download.html All of these are probably still going to miss or mis-identify some files. Depending on your volume, it might make sense to have humans validate CCSID's and use those, or possibly you could generate a database of files and directories fro inclusion/exclusion. HTH, Joe Sam Joe Sam Shirah - http://www.conceptgo.com conceptGO - Consulting/Development/Outsourcing Java Filter Forum: http://www.ibm.com/developerworks/java/ Just the JDBC FAQs: http://www.jguru.com/faq/JDBC Going International? http://www.jguru.com/faq/I18N Que Java400? http://www.jguru.com/faq/Java400 ----- Original Message ----- From: "David Gibbs" <david@xxxxxxxxxxxx> To: "Java Programming on and around the iSeries / AS400" <java400-l@xxxxxxxxxxxx> Sent: Tuesday, February 14, 2006 11:45 AM Subject: Determining if a file is text or binary? > Folks: > > Does anyone know of a technique to determine if a file (in the IFS) is > text or binary (in java)? > > I need to copy various files from the IFS to another server ... these > files can be in any number of CCSID's. > > If the file is text, I want to write it to the other server in the > target servers native text format ... if the file is binary, I don't > want to do any translation. > > I can use the toolkit's CharConverter routines to convert the text from > the files CCSID to the native text format ... but if it's binary, I > don't want to use that routine. > > Any suggestions? > > Thanks! > > david > --
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.