[Expat-discuss] How is SJIS encoding handled in expat?
Agarwal, Saumya
Saumya.Agarwal at netapp.com
Tue Apr 17 17:13:52 CEST 2007
Hi,
I have a scenario in which the encoding of the data on the server is in SJIS format. The client requests this data from the server through an API, the server sends the output in XML parsed by the expat parser.
Here is the input and output -
<?xml version='1.0' encoding='SHIFT-JIS' ?>
<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_filer.dtd'>
<netapp xmlns="http://www.netapp.com/filer/admin <BLOCKED::http://www.netapp.com/filer/admin> " version="1.0"><file-inode-info><inode-number>1193746</inode-number><volume-name>vol0</volume-name></file-inode-info></netapp>
OUTPUT:
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE netapp SYSTEM '/na_admin/netapp_filer.dtd'>
<netapp version='1.1' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><volume-name>vol0</volume-name><volume-fsid>1996999850</volume-fsid><volume-uuid>42a93940-4ed9-11db-ba89-00a098032816</volume-uuid><inode-number>1193746</inode-number><number-of-parents>1</number-of-parents><inode-paths><inode-parent-info><inode-path>/vol/vol0/home/新規ワードパッド ドキュメント.doc</inode-path></inode-parent-info></inode-paths></results></netapp>
As seen above, the client declares the document encoding to be SHIFT-JIS. The server returns the proper data (seems like SJIS, as japanese characters are represented correctly in the output ) but the encoding declared in the output document is UTF-8.
Now, the strange part is that even if the client declares the document endoding to be UTF-8 in the input, the server behavior is just the same!
Here are my questions -
1. Does expat support SJIS encoding?
2. If yes, then how does it know the data is SJIS encoded and when does it call the appropriate handler?
3. Is the output returned by expat, the SJIS encoded data, or does it convert the data to UTF-8 and return it?
4. Is there a way through which expat can declare to the client that the data is actually SJIS and not UTF-8? We have another parser on the client side (libxml2) which fails which a parsing error when the XML output from expat is given to it, as the data is japanese while the encoding declaration is UTF-8.
Thanks,
Saumya
More information about the Expat-discuss
mailing list