[Expat-discuss] Problems with utf8_toUtf8

F J Franklin F.J.Franklin@sheffield.ac.uk
Wed Aug 7 00:06:03 2002


On Tue, 6 Aug 2002, Marta Padilla wrote:
> I'm still working to adapt Expat to AS/400. The code has a different
> behaviour in following function:
> 
> utf8_toUtf8
> 
>   if (fromLim - *fromP > toLim - *toP) {
>     /* Avoid copying partial characters. */
>     for (fromLim = *fromP + (toLim - *toP); fromLim > *fromP; fromLim--)
>       if (((unsigned char)fromLim[-1] & 0xc0) != 0x80)
> 
> --> WHY 0xc0 and =x80 ??

This has nothing to do with ASCII. This line seems to be checking that the
UTF-8 byte "fromLim[-1]" is not a trailing (or 'continuing') byte in a
multi-byte sequence. A character expressed in UTF-8 has between 1 and 6
bytes, inclusive, and all but the first will satisfy "& 0xc0) == 0x80".

Don't know about the function, sorry.

Regards, Frank

Francis James Franklin
F.J.Franklin@shef.ac.uk

"No, she really likes me. She told me I look like Britney Spears, and why
would you say that to somebody you don't like?"
                                                           --- Elle Woods