test_dbm_dumb fails due to what appears to be a character encoding issue
on Mac OS X:
Majestix:Python-3.0rc3 martina$
DYLD_FRAMEWORK_PATH=/Users/martina/Downloads/Python-3.0rc3: ./python.exe
-E -bb ./Lib/test/regrtest.py -l test_dbm_dumbtest_dbm_dumb
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü',
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü',
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü',
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü',
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method
_Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü',
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method
_Database.close of <dbm.dumb._Database object at 0x6a2550>> ignored
Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü',
(3072, 1)\n", 2, 3, 'character maps to <undefined>') in <bound method
_Database.close of <dbm.dumb._Database object at 0x6a2550>> ignored
test test_dbm_dumb failed -- errors occurred; run in verbose mode for
details
1 test failed:
test_dbm_dumb
Example of verbose output (other testcases are similar):
======================================================================
ERROR: test_dumbdbm_creation (test.test_dbm_dumb.DumbDBMTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/martina/Downloads/Python-
3.0rc3/Lib/test/test_dbm_dumb.py", line 41, in test_dumbdbm_creation
f.close()
File "/Users/martina/Downloads/Python-3.0rc3/Lib/dbm/dumb.py", line
228, in close
self._commit()
File "/Users/martina/Downloads/Python-3.0rc3/Lib/dbm/dumb.py", line
116, in _commit
f.write("%r, %r\n" % (key.decode('Latin-1'), pos_and_siz_pair))
File "./Lib/io.py", line 1491, in write
b = encoder.encode(s)
File "./Lib/encodings/mac_roman.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xbc' in
position 2: character maps to <undefined>
The Mac Roman encoding comes into play, because _commit opens _dirfile
without explicitly specifying an encoding. io.open then gets the
encoding via locale.getpreferredencoding, which returns mac-roman:
Majestix:Python-3.0rc3 martina$
DYLD_FRAMEWORK_PATH=/Users/martina/Downloads/Python-3.0rc3: ./python.exe
-c "import locale;print(locale.getpreferredencoding())"
mac-roman
Two issues:
- since dumb.py handles encoding explicitly, shouldn't it specify the
encoding for _dirfile as well? (or use a binary file; but this could
cause new line-ending troubles...)
- is mac-roman really the appropriate choice for
locale.getpreferredencoding? This is on Mac OS X 10.5, not Mac OS 9...
The preferred encoding for Mac OS X should be utf-8, not some legacy
encoding...
Seems to be related to r67310, which was intended to fix issue #3799
http://svn.python.org/view/python/branches/py3k/Lib/dbm/dumb.py?
rev=67310&r1=63662&r2=67310