[Expat-discuss] CRLF conversion question

Armin Bauer armin.bauer at desscon.com
Thu Sep 9 14:55:17 CEST 2004


On Wed, 2004-09-08 at 00:35, Fred L. Drake, Jr. wrote:
> On Tuesday 07 September 2004 06:18 pm, Armin Bauer wrote:
>  > Sorry if its me being stupid but why do CDATA sections contain nodes at
>  > all? As far as my understanding goes the parser has not to touch the
>  > cdata section at all.
> 
> The node structure is defined by whatever API is providing you with nodes 
> (Expat isn't).  If you're using a DOM with wbxml (or if wbxml is using a DOM 
> internally), that's why nodes are used.
> 
> Line-end normalization is required at all times, even inside CDATA marked 
> sections.  CDATA marked sections are not intended as an escape hatch for 
> binary data.
> 

Thats true. But is not xml compliant to do line-end normalization in
cdata sections. 

Definition of a cdata section:

[18]   CDSect   ::=   CDStart CData
CDEnd[19]   CDStart   ::=   '<![CDATA['
[20]   CData   ::=   (Char* - (Char* ']]>' Char*))
[21]   CDEnd   ::=   ']]>'

as taken from http://www.w3.org/TR/REC-xml/#sec-cdata-sect
Notice the Char*.

Char is defined as:

Char   ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and
FFFF. */

as taken from http://www.w3.org/TR/REC-xml/#NT-Char. And this clearly
include 0x0a and 0x0d. So expat should never touch these characters.

>  > wouldnt it be correct if it created exactly one text node containing all
>  > the text as is? At the moment it creates a lot of nodes for every line
>  > etc.
> 
> What's correct depends on the API.  If these are DOM nodes, then yes, that 
> would be correct, but not required for correctness.  The series of nodes 
> would also be correct.  That's a separate issue from line-end normalization.
> 
>  > wbxml is the library that converts syncml request to wap binary xml ( a
>  > form of conversion before it is send over gprs) :)
> 
> Cool, I guess.  Is it this one?
> 
>     http://libwbxml.aymerick.com/
> 
> The tree described in the documentation looks *very* DOMish to me, though 
> perhaps a little lighter than a full W3C DOM (hard to be heavier!).
> 
> 
>   -Fred



More information about the Expat-discuss mailing list