[Expat-discuss] New Expat functionality and API proposal

Karl Waclawek kwaclaw@thestar.ca
Wed Aug 14 06:55:03 2002


> On Thu, Aug 08, 2002 at 02:16:20PM -0400, Fred L. Drake, Jr. wrote:
> >...
> > We welcome feedback and discussion, including the introduction of
> > additional API proposals, on the expat-discuss list.
> 
> I'm not inclined towards either of the two proposals you put forward. It is
> a strange mixture of callbacks and pull-style. An application would need to
> implement both.

The goal was to enable a pull API on top of Expat.
Our proposal - at least how I understand it - was not meant to turn
Expat into a pull parser directly.
 
> Instead, I would much rather see an extra layer over xmltok. It would be a
> *pure* pull API rather than a hybrid. The API would be modelled similar to
> the XMLPULL API (see http://www.xmlpull.org/). Essentially, the API could be
> a generator -- just implement a next() method which returns a token plus
> data. You could set up the generator to have callbacks to fetch more data,
> or you could pass in a buffer into the next() call somehow (I prefer the
> callback approach). The API would need a return value for "partial token",
> in which case you must feed it more data.
> 
> The layer over xmltok would perform entity expansion, namespace handling,
> and other stuff like that.

I like all of that!
But, the internal interface is already a pull interface, isn't it?
I mean, the main parsing loops in xmlparse.c are calling
XmlContentTok, XmlCdataSectionTok, XmlPrologTok, XmlAttributeValueTok,
and XmlEntityValueTok - depending on the state. These are essentially
like the next() function, except for not being combined into one.

I think you are proposing a pull layer directly above those
token fetch functions. Essentially this means re-writing part
of xmlparse.c. Definitely a more efficient idea than our proposal,
but a *lot* more work.
Our proposal simply has the advantage of being doable with comparatively
little effort, given the resources we have.

> Essentially, it would be an entirely new interface. Users could use it, or
> use the classic SAX-like interface. If we do it right, we could even
> refactor the class interface to use the new Pull interface internally.

It already does - see above. For the rest - well, I smell a volunteer! ;-)

Btw, Expat can easily be turned into a pull parser already, if you
feed it the data in one byte buffers. Just build a nextTag() function
that does this in an inner loop, and returns only if a callback gets triggered.
Efficiency is another question, of course.
 
Karl


Get to know us
http://www.thestar.com - Canada's largest daily newspaper online
http://www.toronto.com - All you need to know about T.O.
http://www.workopolis.com - Canada's biggest job site
http://www.torontostartv.com - Webcasting & Production
http://www.newinhomes.com - Ontario's Largest New Home & Condo Website
http://www.waymoresports.com - Canada's most comprehensive sports site
http://www.tmgtv.ca - Torstar Media Group Television