[Expat-discuss] CRLF conversion question
Fred L. Drake, Jr.
fdrake at acm.org
Tue Sep 7 16:55:42 CEST 2004
On Tuesday 07 September 2004 10:09 am, Armin Bauer wrote:
> I suspect that this is a bug in the parser which manufactures the xml
> tree.
I'm not convinced.
> the wbxml lib goes through the cdata which consist of a lot of nodes
> like this:
>
> WBXML Encoder> CDATA Begin
> WBXML Encoder> Text: <BEGIN:VCARD>
> WBXML Encoder> Text: <
>
> WBXML Encoder> Text: <VERSION:2.1>
> WBXML Encoder> Text: <
That seems like how Expat reports it. Most tree-builders normalize on input,
though, so you would get just one text node. There may be a knob to twist;
that depends on your tree-builder. If you're getting a DOM, you should be
able to call .normalize() on the document to take care of this issue.
> But it should be only one node which holds all the cdata. By the way:
> the <
>
> > node holds a 0x0a.
>
> So i guess the parser just creates a node for every 0x0d it encounters
> even it is in the cdata. That doesnt sound right to me.
>
> The correct fix to this (if my assumptions are correct) would be to not
> parse the cdata into nodes but to leave it as is.
It's not clear to me what this means. If CDATA marked sections are
represented, it'll contain zero or more text nodes; this is expected of the
DOM. I don't know what API your using, though (wbxml doesn't tell me
anything, unfortunately), but the nodes look like what I'd expect as output
events from Expat. Especially the normalized newlines coming through as a
separate bit.
> BTW: the fix did not work. It was left as is.
Inside of CDATA marked sections, yes.
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
More information about the Expat-discuss
mailing list