[Expat-discuss] Fwd: I need help with error message 'xml declaration not at start of external entity'
Nick MacDonald
nickmacd at gmail.com
Tue Jun 24 16:46:58 CEST 2008
[Sorry, forgot to copy the rest of the list]
---------- Forwarded message ----------
From: Nick MacDonald <nickmacd at gmail.com>
Date: Tue, Jun 24, 2008 at 10:46 AM
Subject: Re: [Expat-discuss] I need help with error message 'xml
declaration not at start of external entity'
To: Steve Fogoros <sfogoros at hsc.unt.edu>
Steve:
Careful to be sure you're reading what I wrote, and not what you want
to see... the Backus–Naur Form rules in the current XML spec are very
clear that whitespace is NOT ALLOWED where you wish they were.
http://www.w3.org/TR/2006/REC-xml-20060816/
http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form
The only rules you need to read are right here:
2.1 Well-Formed XML Documents
[Definition: A textual object is a well-formed XML document if:]
1.Taken as a whole, it matches the production labeled document.
2. It meets all the well-formedness constraints given in this specification.
3. Each of the parsed entities which is referenced directly or
indirectly within the document is well-formed.
Document
[1] document ::= prolog element Misc*
[3] S ::= (#x20 | #x9 | #xD | #xA)+
[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl?
SDDecl? S? '?>'
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"'
VersionNum '"')
[25] Eq ::= S? '=' S?
[26] VersionNum ::= '1.0'
[27] Misc ::= Comment | PI | S
What you'd be looking for is a way to get some S (3) into the front of
a valid expansion of document (1). I can't see any way where this can
happen. The text elsewhere in the document is not really specifically
applicable to your case... there is certainly nothing more normative
than the explict BNF grammar provided, so you have to follow its
rules. You're confused by single quotes in the BNF? Just assume
they're double quotes if that's less confusing... the expectation is
that they are literal strings showing the exact characters you would
need to supply at that point in the grammar.
Nothing is wrong with eXpat... it is acting exactly as the spec dictates.
Nick
On Mon, Jun 23, 2008 at 11:30 AM, Steve Fogoros <sfogoros at hsc.unt.edu> wrote:
> I thought that too based on sections 2.1 and 2.8 of Extensible Markup
> Language (XML) 1.0 (Second Edition) at
> http://www.w3.org/TR/2000/REC-xml-20001006#NT-document . I was having
> trouble with the single quotes around the XMLDecl declaration. I've
> never seen that in a formal grammar and didn't want to assume it meant
> that nothing comes before the prolog, if it exists.
>
> I looked more closely and found section 2.4 of the specification does
> address my question and I believe states that whitespace is allowed
> before the XML specification:
>
> Quoted from URL referenced above:
>
> Text consists of intermingled character data and markup. [Definition:
> Markup takes the form of start-tags, end-tags, empty-element tags,
> entity references, character references, comments, CDATA section
> delimiters, document type declarations, processing instructions, XML
> declarations, text declarations, and any white space that is at the top
> level of the document entity (that is, outside the document element and
> not inside any other markup).]
> Nick, do you read this the same way I do? And, in case I haven't
> researched completely, has it been superceded in version 1.1?
>
> Thanks again for validating my assumptions. I think I will pass this on
> to the maintainers of expat
> Steve Fogoros
>
>>>> "Nick MacDonald" <nickmacd at gmail.com> 6/23/2008 7:52 AM >>>
> Perhaps you're not reading the same XML spec I am, because to me it is
> ABSOLUTELY clear that whitespace is not allowed to come before the XML
> specification:
>
> The primary rule states this:
>
> document ::= prolog element Misc*
>
> Note that a prolog is defined as so:
>
> prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
>
> Which says the XMLDecl is optional, but if present, it would be defined
> as so:
>
> XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
>
> and since whitespace (the term 'S' as used in this ruleset) does not
> appear to be mentioned until the end of the XMLDecl, it makes it
> pretty clear its not allowed at the beginning.
>
> If you don't like this behaviour, you'd be better off lobbying the W3C
> to change the spec, but as it stands, eXpat is quite clearly enforcing
> the rules of validity for an XML document.
>
> Note that the rules also make it quite clear that you don't HAVE to
> have a XMLDecl, and thus without it your document can have as much
> initial whitespace as makes you happy... (I have tested this with
> eXpat and it works fully as expected.)
>
> Nick
>
>
> On Fri, Jun 20, 2008 at 1:02 PM, Steve Fogoros <sfogoros at hsc.unt.edu>
> wrote:
>> I'm using the PHP module XML Parser under PHP version 4.4.0.
>>
>> I get the referenced error due to new lines before the preamble.
>>
>> I've searched the error message and reviewed the w3c spec for
>> information on xml parsing. I haven't found anything that explicitly
>> states what a parser should do about leading white space outside of
> the
>> xml document. I've also noted that there are many failures of this
> type
>> reported on WordPress and RSS feed forums. In all cases, the
> correction
>> seems to be altering the provider application to submit the xml
> document
>> without any leading characters before the preamble.
>>
>> My question is: does the xml spec explicitly specify that there be
>> nothing other than the preamble at the beginning of a well formed
> xml
>> doc? Is this something that shoud/could be addressed in the parser
> (it
>> sure would eliminate a lot of failed implementations)?
>
> --
> Nick MacDonald
> NickMacD at gmail.com
>
>
>
>
> ** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. **
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss
>
--
Nick MacDonald
NickMacD at gmail.com
--
Nick MacDonald
NickMacD at gmail.com
More information about the Expat-discuss
mailing list