[Expat-discuss] CRLF conversion question
Armin Bauer
armin.bauer at desscon.com
Thu Sep 9 14:55:17 CEST 2004
On Wed, 2004-09-08 at 00:35, Fred L. Drake, Jr. wrote:
> On Tuesday 07 September 2004 06:18 pm, Armin Bauer wrote:
> > Sorry if its me being stupid but why do CDATA sections contain nodes at
> > all? As far as my understanding goes the parser has not to touch the
> > cdata section at all.
>
> The node structure is defined by whatever API is providing you with nodes
> (Expat isn't). If you're using a DOM with wbxml (or if wbxml is using a DOM
> internally), that's why nodes are used.
>
> Line-end normalization is required at all times, even inside CDATA marked
> sections. CDATA marked sections are not intended as an escape hatch for
> binary data.
>
Thats true. But is not xml compliant to do line-end normalization in
cdata sections.
Definition of a cdata section:
[18] CDSect ::= CDStart CData
CDEnd[19] CDStart ::= '<![CDATA['
[20] CData ::= (Char* - (Char* ']]>' Char*))
[21] CDEnd ::= ']]>'
as taken from http://www.w3.org/TR/REC-xml/#sec-cdata-sect
Notice the Char*.
Char is defined as:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and
FFFF. */
as taken from http://www.w3.org/TR/REC-xml/#NT-Char. And this clearly
include 0x0a and 0x0d. So expat should never touch these characters.
> > wouldnt it be correct if it created exactly one text node containing all
> > the text as is? At the moment it creates a lot of nodes for every line
> > etc.
>
> What's correct depends on the API. If these are DOM nodes, then yes, that
> would be correct, but not required for correctness. The series of nodes
> would also be correct. That's a separate issue from line-end normalization.
>
> > wbxml is the library that converts syncml request to wap binary xml ( a
> > form of conversion before it is send over gprs) :)
>
> Cool, I guess. Is it this one?
>
> http://libwbxml.aymerick.com/
>
> The tree described in the documentation looks *very* DOMish to me, though
> perhaps a little lighter than a full W3C DOM (hard to be heavier!).
>
>
> -Fred
More information about the Expat-discuss
mailing list