◐ Shell
clean mode source ↗

Message 76080 - Python tracker

If you look at the 2.7 code all it requires of keys and values in
__setitem__ is that they are strings; there is nothing about Latin-1 in
terms of specific encoding (must be a 3.0 addition to make the
str/unicode transition the easiest). That would suggest to me that
assuming that previous DBs were written in Latin-1 is somewhat bogus as
people could have passed in any str encoded in any format as a DB key or
value.

Thus I think going down the UTF-8 route is the right thing to do for
string arguments. A quick look at _gdbmmodule.c supports this as it just
converts its arguments through PyArg_Parse("s#") to get its keys and
thus uses UTF-8 as the default encoding.