Hi David,

        As you might expect from the response so far, it is nearly
impossible, in the general case, to determine text/binary format with 100%
reliability.  Most attempts check for magic numbers ( like 0xCAFEBABE for
Java classes ) and values greater than 127 to detect non-ASCII values.

    The problem is that with Unicode variations and other encodings, along
with user ( programmer, that is ) generated binary files with no formal
headers, the previous techniques have either a lot of misses or invalid
matches.  The AS/400 has our old friend CCSID 65535 to denote binary files,
unlike other systems, but it's not used consistently, and has ended up being
a barrier rather than an aid.

    This link:

http://www.gsp.com/cgi-bin/man.cgi?section=1&topic=file

gives an idea of what's involved for the "file" command on Unix type
systems.

    There's a Windows, program called TrID - File Identifier that uses an
XML-based database and allows you to add your own entries that may help.
It's at:

http://mark0.net/soft-trid-e.html

    Also, Mozilla has a module that attempts to determine binary/text.  I'd
guess it's probably used in Firefox as well.  The source is available for
these products in C/C++.  You could possibly use or convert that.  This
page:

http://gemal.dk/blog/2003/12/18/autodetect_correct_mime_type_from_textplain_content/

has links to a full discussion.  It uses the term "Firebird", which I think
mostly evolved to Firefox.  These can be downloaded at:

http://www.mozilla.org/download.html

    All of these are probably still going to miss or mis-identify some
files.  Depending on your volume, it might make sense to have humans
validate CCSID's and use those, or possibly you could generate a database of
files and directories fro inclusion/exclusion.  HTH,


                                                         Joe Sam

Joe Sam Shirah -        http://www.conceptgo.com
conceptGO       -        Consulting/Development/Outsourcing
Java Filter Forum:       http://www.ibm.com/developerworks/java/
Just the JDBC FAQs: http://www.jguru.com/faq/JDBC
Going International?    http://www.jguru.com/faq/I18N
Que Java400?            http://www.jguru.com/faq/Java400


----- Original Message ----- 
From: "David Gibbs" <david@xxxxxxxxxxxx>
To: "Java Programming on and around the iSeries / AS400"
<java400-l@xxxxxxxxxxxx>
Sent: Tuesday, February 14, 2006 11:45 AM
Subject: Determining if a file is text or binary?


> Folks:
>
> Does anyone know of a technique to determine if a file (in the IFS) is
> text or binary (in java)?
>
> I need to copy various files from the IFS to another server ... these
> files can be in any number of CCSID's.
>
> If the file is text, I want to write it to the other server in the
> target servers native text format ... if the file is binary, I don't
> want to do any translation.
>
> I can use the toolkit's CharConverter routines to convert the text from
> the files CCSID to the native text format ... but if it's binary, I
> don't want to use that routine.
>
> Any suggestions?
>
> Thanks!
>
> david
> -- 


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.