[Expat-discuss] Encoding issues in expat
Subramanian, Binu
binu.subramanian@sciatl.com
Wed Aug 7 23:23:02 2002
Hello Karl,
I have integrated expat into my application running on VC 6.0.
I read in the XML file using expat in VC 6.0, store it a CString and then
display it in an edit control in a dialog.
I use the SAXInCpp API's which has a wrapper for expat.
First case : Expat compiled in UTF-8 ( without specifying XML_UNICODE in the
build options.)
For the following eg XML File:
1. The Start Element, End Element functions give me the corresponding tags :
ie "GridData", "Identification", etc....
2. The entities ¡ corresponding to ¡, are prefixed with a character Â
and so in my edit control i see the loaded XML file display ¡
Second Case : Expat compiled in UTF-16( by specifying XML_UNICODE in the
build options.)
For the following eg XML File:
1. The Start Element, End Element functions DO NOT give me the corresponding
tags : ie "GridData". Instead only the first character is
given during parsing : 'G'.
2. The entities ¡ corresponding to ¡, are not prefixed with the
character  in this case.
What should i do?
-----------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<GridData>
<Identification>
<Version>1.0.0</Version>
<Modified>00:00:00, Saturday, December 30, 1899</Modified>
</Identification>
<Options>
<Option Name = "Comment">This report was generated during UPS
maintenance</Option>
<Option Name = "Sort">Col3, Ascending</Option>
<Option Name = "Show Instances">Vertical</Option>
</Options>
...............
<Values>
<Row Id = "Harini0">
<Cell Status =
"Ok">¡¢£¥¦§©®°±²³
;µ¹¼½¾<CellDescr>This is first
cell</CellDescr></Cell>
<RowDescr>This is first row</RowDescr>
</Row>
-----------------------------------------------------
kr,
Binu
-----Original Message-----
From: Karl Waclawek [mailto:karl@waclawek.net]
Sent: 26 July 2002 18:55
To: Subramanian, Binu; expat-discuss@lists.sourceforge.net
Subject: Re: [Expat-discuss] Encoding issues in expat
> I am working on Win 2000, VC++ 6.0
> I am using expat 1.95.4 version.
> It is compiled for UTF-8 output and i have specifed the encoding in =
the
XML
> file as UTF-8.
> Still when i load the XML file, a character  is prefixed to the =
special
> characters like ( Euro, trademark, etc).
> What can i do to see that the Euro character properly? I have =
enclosed the
> XML file i am using.
> Any suggestion/help will be useful.
To understand you correctly:
You are reading an XML file using Expat, writing it out again to another
file,
based on the callbacks from Expat. Then you view this other file loading it
into some word processor, is that right?
Well, which word processor do you use?
On Windows, not all editors can display UTF-8 well.
And even if they can, they usually require a BOM (byte order mark)
at the beginning of the file, even for UTF-16.
In any case, the native Unicode version for Windows is UTF-16(LE).
So, first I recommend you compile Expat for UTF-16.
Then I recommend you write a BOM to the output file,
details about BOMs can be found on http://www.unicode.org.
Then you should be able to display it.
Btw, it seems the file you attached does not contain
the Euro symbol.
Karl