[Expat-discuss] expat & simple c expat wrapper
Nick MacDonald
nickmacd at gmail.com
Tue Aug 14 23:04:49 CEST 2007
In a nutshell, there are two general types of parsers, DOM and SAX.
Expat is a SAX parser.
There is a lot that could be said on this topic, but perhaps look at
the eXpat mailing list archives... I am very sure the difference
between SAX and DOM has been covered there many times (a few of them
by myself.)
SAX is usually more memory efficient and faster at parsing, but if you
need to cross reference data in your document, you will need to figure
a way to do this, as you SAX is fire and forget... once you have gone
forward in the document there is no going backward without re-parsing
the whole file over again.
DOM builds a copy of the document in memory, which allows you to see
different parts all at once, but this can be quite a memory hog for
large documents, and for specially crafted ones (that have a lot of
nesting.) For one such example, search for "million laughs", but here
is one such example which is known to crash some systems:
<?xml version="1.0"?>
<!DOCTYPE million [
<!ELEMENT million (#PCDATA)>
<!ENTITY laugh0 "ha! ">
<!ENTITY laugh1 "&laugh0;&laugh0;">
<!ENTITY laugh2 "&laugh1;&laugh1;">
<!ENTITY laugh3 "&laugh2;&laugh2;">
<!ENTITY laugh4 "&laugh3;&laugh3;">
<!ENTITY laugh5 "&laugh4;&laugh4;">
<!ENTITY laugh6 "&laugh5;&laugh5;">
<!ENTITY laugh7 "&laugh6;&laugh6;">
<!ENTITY laugh8 "&laugh7;&laugh7;">
<!ENTITY laugh9 "&laugh8;&laugh8;">
<!ENTITY laugh10 "&laugh9;&laugh9;">
<!ENTITY laugh11 "&laugh10;&laugh10;">
<!ENTITY laugh12 "&laugh11;&laugh11;">
<!ENTITY laugh13 "&laugh12;&laugh12;">
<!ENTITY laugh14 "&laugh13;&laugh13;">
<!ENTITY laugh15 "&laugh14;&laugh14;">
<!ENTITY laugh25 "&laugh15;&laugh15;">
<!ENTITY laugh26 "&laugh25;&laugh25;">
<!ENTITY laugh27 "&laugh26;&laugh26;">
<!ENTITY laugh28 "&laugh27;&laugh27;">
<!ENTITY laugh29 "&laugh28;&laugh28;">
<!ENTITY laugh30 "&laugh29;&laugh29;">
]>
<million>
<title>More than two million laughs!</title>
<laughs>&laugh30;</laughs>
</million>
As you can see, it is very short, but can be very deadly to a DOM
parser. SAX should handle this with some grace, depending on how you
implement your parsing. This should work fine for eXpat because it
won't do the ENTITY expansion for you, so you wouldn't even see the
problem unless you supported ENTITY's in your XML spec.
Nick
On 8/14/07, Senthil Nathan <rsennat at gmail.com> wrote:
> I would like to know the difference between expat and scew.
>
> Scew says it creates a DOM tree which expat doesn't.
> or even whats between stream oriented parser (eg. expat) and
> the DOM parser (eg. scew)
>
> Can anyone please explain more on this front.
> Im also looking at their websites, but I guess I lack something
> here. So please explain.
More information about the Expat-discuss
mailing list