Nathan,

content-type: text/plain; charset=UTF-8

will set the server in BINARY mode


On Wed, Jul 3, 2013 at 5:57 PM, Nathan Andelin <nandelin@xxxxxxxxx> wrote:

Thanks for the replies Kevin and Henrik. I am using DefaultNetCCSID 1208
to handle UTF-8 encoding, which appears to work as expected for "text"
content types.

Henrik's explanation of the HTTP server going into binary mode whenever
content type is anything other than "text" is completely inline with my
testing. Luckily for me, the people on this list are so willing to share
information. I searched high and low over the internet for an explanation
to what I was observing before finally turning here. I assumed that the
HTTP server was scrambling the XML, but I wanted confirmation that it
wasn't something in the TCP/IP stack or happening on the client end. I'm
happy to know that the binary coding is linked specifically to the
Content-type header.

While I was searching for information, it became a lot more clear why it's
important to use UTF-8 or UTF-16 encoding for multi-language web sites and
services.

I'm writing a web service that will be used within a hub and spoke
topology (enterprise server bus). I will test with the "hub" to see if it
will accept content-type: text/plain; charset=UTF-8. If that fails then
I'll explore Henrik's tip about converting to UTF-8 myself in connection
with binary mode.

-Nathan



________________________________
From: Henrik Rützou <hr@xxxxxxxxxxxx>
To: Web Enabling the IBM i (AS/400 and iSeries) <web400@xxxxxxxxxxxx>
Sent: Wednesday, July 3, 2013 5:44 AM
Subject: Re: [WEB400] XML response is getting scrambled


Hi Nathan & Kevin,



Apache converts the output buffer from EBCDIC to ASCII/UTF-8 when the
content-type is TEXT/... and there isn't any charset property in the header
and the cgiconvmode is EBCDIC.



If you set the content-type property to anything else or set charset
property you have to encode the body yourself because apache will go into
BINARY mode. So that is why you get garbage and that’s the secret! Remember
that the header always has to be in EBCDIC!





Here is a little dirty trick to handle DBCS in SBCS EBCDIC:



If you have any DBCS field you convert it first to a UTF-8 string using
ICONV, then you convert the UTF-8 string to EBCDIC but you trick ICONV to
think that the string is ASCII (CCSID 819) - if you then sends the buffer
to the apache as plain TEXT/... with DefaultNetCCSid 819 and no charset
property the buffer will be correct encoded into Unicode/UTF-8.



You can also use ICONV and translate the body of buffer from EBCDIC to
CCSID 819 and then send any content-type. It's a little tricky since the
header must be in EBCDIC.



Inbound messages are the same - you trick the system to think the received
UTF-8 is ASCII and can convert it back and forth to EBCDIC byte for byte or
even let apache do the job.



So if you have EBCDIC encoded UTF-8 and wants it to be placed in a DBCS
field you convert the EBCDIC back to CCSID 819 and then trick ICONV and
convert the ASCII string as UTF-8 to DBCS.



If you want to have full unicode support cgiconvmode EBCDIC and
defaultnetccsid 1208 is a bad combination since any unsupported character
in EBCDIC will be replaced by a blank in incoming messages while
defaultnetccsid 819 will convert any ASCII byte to an EBCDIC equivalent.



BINARY mode is also a pain since you have to do all conversions yourself
like Kevin prefers.



I do a lot of work (from time to time) on a new CGIDEV2 version that runs a
storage model based on EBCDIC encoded UTF-8 and is able to handle both
outgoing and incoming SBCS/DBCS mixed buffers in Unicode and still is
backward compatible with old CGIDEV2 coding.



I have a beta model running at
http://5.103.128.110:6382/pextcgiCOR/testutf8.pgm



But I still needs a lot of supporting procedures to my core API so it will
become as “easy to use” as CGIDEV2 is today and updHtmlVar(‘aaa’:myfield);
must have a updHtmlVarDbcs(‘aaa’:myDbcsField:1200); equivalent that is able
to be mixed.



However, and maybe emphasized by the recent discussions about the future of
IBM I on LinkedIn, we have to address the needs in the emerging markets for
IBM I DBCS solutions and not treat DBCS and Unicode as “yes, we can with
obstacles” but in a more fluent way.


On Wed, Jul 3, 2013 at 10:02 AM, Kevin Turner
<kevin.turner@xxxxxxxxxxxxxxx>wrote:

Personally, I prefer to use "cgiconvmode BINARY" in the apache config and
convert to/from UTF-8 myself, but you might also try "DefaultNetCCSID
1208"

-----Original Message-----
From: web400-bounces@xxxxxxxxxxxx [mailto:web400-bounces@xxxxxxxxxxxx]
On
Behalf Of Nathan Andelin
Sent: 03 July 2013 02:33
To: Web Enabling the AS400 / iSeries
Subject: [WEB400] XML response is getting scrambled

Hello,

I'm writing a CGI web service that returns XML in accordance with a
specification which mandates Content-Type:
application/xml;charset="utf-8"
in the HTTP Header.

Content-Type: text/plain works fine.

But

Content-Type: application/xml;charset="utf-8" causes the stream to get
scrambled before being returned to the browser.

What is the secret to getting the encoding right?


-Nathan.
--
This is the Web Enabling the IBM i (AS/400 and iSeries) (WEB400) mailing
list To post a message email: WEB400@xxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/web400
or email: WEB400-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives at
http://archive.midrange.com/web400.


NOTICE: The information in this electronic mail transmission is intended
by CoralTree Systems Ltd for the use of the named individuals or entity
to
which it is directed and may contain information that is privileged or
otherwise confidential. If you have received this electronic mail
transmission in error, please delete it from your system without copying
or
forwarding it, and notify the sender of the error by reply email or by
telephone, so that the sender's address records can be corrected.





--------------------------------------------------------------------------------


CoralTree Systems Limited
25 Barnes Wallis Road
Segensworth East, Fareham
PO15 5TT

Company Registration Number 5021022.
Registered Office:
12-14 Carlton Place
Southampton, UK
SO15 2EA
VAT Registration Number 834 1020 74.
--
This is the Web Enabling the IBM i (AS/400 and iSeries) (WEB400) mailing
list
To post a message email: WEB400@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/web400
or email: WEB400-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/web400.




--
Regards,
Henrik Rützou

http://powerEXT.com <http://powerext.com/>
--
This is the Web Enabling the IBM i (AS/400 and iSeries) (WEB400) mailing
list
To post a message email: WEB400@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/web400
or email: WEB400-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/web400.
--
This is the Web Enabling the IBM i (AS/400 and iSeries) (WEB400) mailing
list
To post a message email: WEB400@xxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/web400
or email: WEB400-request@xxxxxxxxxxxx
Before posting, please take a moment to review the archives
at http://archive.midrange.com/web400.





As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.