◐ Shell
reader mode source ↗
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
File filter
Conversations
Jump to
Diff view
Apply and reload
Show whitespace
Diff view
Apply and reload
95 changes: 48 additions & 47 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1393,77 +1393,78 @@ Character Map Codecs
This codec is special in that it can be used to implement many different codecs
(and this is in fact what was done to obtain most of the standard codecs
included in the :mod:`encodings` package). The codec uses mapping to encode and
decode characters.

Decoding mappings must map single string characters to single Unicode
characters, integers (which are then interpreted as Unicode ordinals) or ``None``
(meaning "undefined mapping" and causing an error).

Encoding mappings must map single Unicode characters to single string
characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
(meaning "undefined mapping" and causing an error).

The mapping objects provided must only support the __getitem__ mapping
interface.

If a character lookup fails with a LookupError, the character is copied as-is
meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
resp. Because of this, mappings only need to contain those mappings which map
characters to different code points.

These are the mapping codec APIs:

.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
PyObject *mapping, const char *errors)

Create a Unicode object by decoding *size* bytes of the encoded string *s* using
the given *mapping* object. Return *NULL* if an exception was raised by the
codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
dictionary mapping byte or a unicode string, which is treated as a lookup table.
Byte values greater that the length of the string and U+FFFE "characters" are
treated as "undefined mapping".


.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)

Encode a Unicode object using the given *mapping* object and return the result
as Python string object. Error handling is "strict". Return *NULL* if an
exception was raised by the codec.

The following codec API is special in that maps Unicode to Unicode.


.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
PyObject *table, const char *errors)

Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
character mapping *table* to it and return the resulting Unicode object. Return
*NULL* when an exception was raised by the codec.

The *mapping* table must map Unicode ordinal integers to Unicode ordinal
integers or ``None`` (causing deletion of the character).

Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
and sequences work well. Unmapped character ordinals (ones which cause a
:exc:`LookupError`) are left untouched and are copied as-is.

.. deprecated-removed:: 3.3 4.0
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
<codec-registry>`


.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
PyObject *mapping, const char *errors)

Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
*mapping* object and return a Python string object. Return *NULL* if an
exception was raised by the codec.

.. deprecated-removed:: 3.3 4.0
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
:c:func:`PyUnicode_AsCharmapString` or
:c:func:`PyUnicode_AsEncodedString`.


MBCS codecs for Windows
Expand Down
Toggle all file notes Toggle all file annotations