◐ Shell
clean mode source ↗

Message 121568 - Python tracker

On Fri, Nov 19, 2010 at 3:06 PM, STINNER Victor <report@bugs.python.org> wrote:
> .. Whereas PyUnicode_FromFormatV() converts the format string
> (bytes) to unicode (characters). If you would like a comparaison in C, it's
> like printf()+mbstowcs() in the same function.
>

I see.  So it is really the

        else
            *s++ = *f;

that surreptitiously widens the characters.

..
> I choosed to use ASCII instead of UTF-8, because an UTF-8 decoder is long (210
> lines) and complex (see PyUnicode_DecodeUTF8Stateful()), whereas ASCII decode
> is just: "unicode_char = (Py_UNICODE)byte;" + an if before to check that 0 <=
> byte <= 127).

I don't think we need 210 lines to replace "*s++ = *f" with proper
UTF-8 logic.  Even if we do, the code can be shared with
PyUnicode_DecodeUTF8 and a UTF-8 iterator may be a welcome addition to
Python C API.