[Expat-discuss] & symbol workaround

Mark Williams Mark.Williams at techop.co.uk
Tue Feb 24 10:02:27 CET 2009


> * Brad Causey <bradcausey at gmail.com> wrote:
> > All,
> > 
> > It sounds like the consensus is that I need to mod the 
> incoming badly 
> > formatted xml. This is my solution, and it worked for what 
> I needed it for:
> > 
> >    fileo = open(i,'r')
> >    file = open('buffer.xml','w')
> >    unfixml = fileo.read()
> >    fixml = string.replace(unfixml,'&',' ')
>       ^^^^^^^
> 
> This will make trouble if you get some escaped symbol (eg. &amp;).
> So, you'll have to find the &'s, check what comes after and then 
> decide whether to fixup or let it pass.
> 
> BTW: is there any way for hooking into the parser (some callback) 
> to catch those errors and then continue parsing ?
> That would allow building an auto-fixing parser, especially for
> cases like Brad's.

It's not clear if you have any control on the "XML" that is the input to
your program.  If so, get it changed to be valid XML.  On awkward data
you
can encode it to make it valid (I encode binary data as hexadecimal
strings).

Otherwise you need to preprocess the data and convert in into valid XML
as others have said.

I like Enrico's idea of having error callbacks.  This could be useful
in many situations.  

Mark.



More information about the Expat-discuss mailing list