[Expat-discuss] CRLF conversion question

Fred L. Drake, Jr. fdrake at acm.org
Tue Sep 7 16:55:42 CEST 2004

On Tuesday 07 September 2004 10:09 am, Armin Bauer wrote:
 > I suspect that this is a bug in the parser which manufactures the xml
 > tree.

I'm not convinced.

 > the wbxml lib goes through the cdata which consist of a lot of nodes
 > like this:
 > WBXML Encoder> CDATA Begin
 > WBXML Encoder> Text: <BEGIN:VCARD>
 > WBXML Encoder> Text: <
 > WBXML Encoder> Text: <VERSION:2.1>
 > WBXML Encoder> Text: <

That seems like how Expat reports it.  Most tree-builders normalize on input, 
though, so you would get just one text node.  There may be a knob to twist; 
that depends on your tree-builder.  If you're getting a DOM, you should be 
able to call .normalize() on the document to take care of this issue.

 > But it should be only one node which holds all the cdata. By the way:
 > the <
 > > node holds a 0x0a.
 > So i guess the parser just creates a node for every 0x0d it encounters
 > even it is in the cdata. That doesnt sound right to me.
 > The correct fix to this (if my assumptions are correct) would be to not
 > parse the cdata into nodes but to leave it as is.

It's not clear to me what this means.  If CDATA marked sections are 
represented, it'll contain zero or more text nodes; this is expected of the 
DOM.  I don't know what API your using, though (wbxml doesn't tell me 
anything, unfortunately), but the nodes look like what I'd expect as output 
events from Expat.  Especially the normalized newlines coming through as a 
separate bit.

 > BTW: the &#13; fix did not work. It was left as is.

Inside of CDATA marked sections, yes.


Fred L. Drake, Jr.  <fdrake at acm.org>

More information about the Expat-discuss mailing list