Yeah, I hear you. But I've seen different interpretations. Here are a couple statements from a developerworks article - these refer to the material between the CDATA markers.

"Anything between those bits of markup will pass through the XML parser untouched."

and

"In either case, the contents of the CDATA section will be available without modification."

That's kind of what I had thought. It might be a misinterpretation on the part of the authors, too. The article is at -

http://www.ibm.com/developerworks/library/x-cdata/

On the other hand, the W3 spec on CDATA doesn't mention spaces, it speaks of using CDATA "...to escape blocks of text containing characters which would otherwise be recognized as markup."

BTW, this XML is coming TO me from someone else.

There is some attribute called xml:space - but I don't have control of the markup until I get it.

I chose not to use the trim=none option, because there are also newlines in a <history> tag - we are bringing this data into a physical file on our system and a fixed-length text file for the mainframe here. I will give it a try, as it may not matter if newlines are preserved.

I do think my best choice is to encode these spaces - we do need them. That is, if trim=none is not working as we want it to.

Thanks, and see you in a couple weeks.

Vern

On 9/20/2013 2:20 PM, Scott Klement wrote:
Vern,

CDATA is normally used so you don't have to escape special characters in
your XML data (such as <, > and & symbols). As far as I know, it has
absolutely nothing to do with blanks.

I have not run a test, but... I would never have thought or guessed in
a million years that CDATA would stop XML-INTO from removing blanks.
That's not what CDATA is intended for, and I've certainly never seen it
used for that.

If you want to prevent XML-INTO from removing blanks, why don't you use
the trim option?

-SK


On 9/20/2013 1:44 PM, Vernon Hamberg wrote:
I have an XML file I'm processing - comes from a "partner" app elsewhere
here.

One of the nodes is our customer number, and it can contain more than
one space, as here -

<custno><![CDATA[008_XY 00020001]]></custno>

We are to expect the CDATA, since we are assuming it should tell the
parser to leave things alone.

Now is that a correct assumption? I did a little digging, and it seems
there is some variation in interpretation.

XML-INTO is what I'm using, with the default for the trim option (to
trim all, including leading and trailing whitespace when there is more
than one space, leaving a single space). I left it this way, because we
also get newlines in the data.

I would like to know if XML-INTO should leave things alone that are in a
CDATA block - that seems to be generally assumed, but I can easily be
mistaken here.

My main option is to encode these particular spaces - sed should do the
trick with a little effort. The alternative is to get the software on
the other end to do the encoding - good luck! And some consultant would
want us to run the PAYMNY command.

Thoughts? Bug? Feature? Options?

Thanks
Vern



As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2025 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.