[Expat-discuss] XML_CharacterDataHandler: can it receive text cut half inside a multibyte character sequence?
karl at waclawek.net
Sat Mar 14 22:05:13 CET 2009
Boris Dušek wrote:
> When expat calls the function set by XML_SetCharacterDataHandler, can
> the function receive a block of text (with parameters const XML_Char
> *s, int len) such that it ends in the middle of a multibyte character?
> (i.e. there is a unicode character encoded as a sequence of 2-4 bytes,
> and the block's last character, s[len-1], is a character of a
> multibyte sequence that is not a last character of such multibyte
> but it would be great if expat did not end in the middle of a
> multibyte sequence.
Expat should not return partial characters, though it can handle partial
characters on input (unless it is the last input buffer, of course).
Btw, there is also the UTF-16 version of Expat - libexpatw, returning
UTF-16 encoded content.
More information about the Expat-discuss