Issue 10552: Tools/unicode/gencodec.py error - Python tracker
Issue10552
Created on 2010-11-27 20:29 by belopolsky, last changed 2022-04-11 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue10552.diff | belopolsky, 2010-11-27 21:15 | review | ||
| issue10552a.diff | belopolsky, 2010-11-29 18:36 | review | ||
| 10552-remove-apple-files.txt | akuchling, 2013-11-10 18:24 | Remove problematic mapping files before parsing | ||
| 10552-remove-apple-files-v2.txt | martin.panter, 2015-01-13 05:57 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg122549 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-27 20:29 | |
$ ../../python.exe gencodec.py MAPPINGS/VENDORS/MISC/ build/
converting APL-ISO-IR-68.TXT to build/apl_iso_ir_68.py and build/apl_iso_ir_68.mapping
converting ATARIST.TXT to build/atarist.py and build/atarist.mapping
converting CP1006.TXT to build/cp1006.py and build/cp1006.mapping
converting CP424.TXT to build/cp424.py and build/cp424.mapping
Traceback (most recent call last):
File "gencodec.py", line 421, in <module>
convertdir(*sys.argv[1:])
File "gencodec.py", line 391, in convertdir
pymap(mappathname, map, dirprefix + codefile,name,comments)
File "gencodec.py", line 355, in pymap
code = codegen(name,map,encodingname,comments)
File "gencodec.py", line 268, in codegen
precisions=(4, 2))
File "gencodec.py", line 152, in python_mapdef_code
mappings = sorted(map.items())
TypeError: unorderable types: NoneType() < int()
It does appear to have been updated for 3.x:
$ python2.7 gencodec.py MAPPINGS/VENDORS/MISC/ build/
Traceback (most recent call last):
File "gencodec.py", line 35, in <module>
UNI_UNDEFINED = chr(0xFFFE)
ValueError: chr() arg not in range(256)
|
|||
| msg122559 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-27 21:15 | |
Attached patch addresses the issue by using -1 instead of None for missing codes. Comparison of generated encoding files to those in Lib/encodings shows only whitespace changes except one which appears to be a change on the unicode.org side: diff -b build/koi8_u.py ../../Lib/encodings/koi8_u.py 1c1 < """ Python Character Mapping Codec koi8_u generated from 'MAPPINGS/VENDORS/MISC/KOI8-U.TXT' with gencodec.py. --- > """ Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py. 221c221 < '\u0491' # 0xAD -> CYRILLIC SMALL LETTER GHE WITH UPTURN --- > '\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN 237c237 < '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN --- > '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN 308d307 < |
|||
| msg122565 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-11-27 22:09 | |
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > Attached patch addresses the issue by using -1 instead of None for missing codes. Comparison of generated encoding files to those in Lib/encodings shows only whitespace changes except one which appears to be a change on the unicode.org side: Please use a global constant instead of the literal -1, e.g. MISSING_CODE. Thanks. > diff -b build/koi8_u.py ../../Lib/encodings/koi8_u.py > 1c1 > < """ Python Character Mapping Codec koi8_u generated from 'MAPPINGS/VENDORS/MISC/KOI8-U.TXT' with gencodec.py. > --- >> """ Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py. > 221c221 > < '\u0491' # 0xAD -> CYRILLIC SMALL LETTER GHE WITH UPTURN > --- >> '\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN > 237c237 > < '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN > --- >> '\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN > 308d307 > < That's just a comment and doesn't change the semantics of the codec. |
|||
| msg122585 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-27 23:02 | |
Attached patch uses MISSING_CODE as Mark suggested. There are still errors apparently because parsecodes() may return either an int or a tuple. I think only mac encodings are affected, so I would like to commit the current patch before tackling this issue.
$ ../../python.exe gencodec.py MAPPINGS/VENDORS/APPLE/ build/ mac_
converting ARABIC.TXT to build/mac_arabic.py and build/mac_arabic.mapping
converting CELTIC.TXT to build/mac_celtic.py and build/mac_celtic.mapping
converting CENTEURO.TXT to build/mac_centeuro.py and build/mac_centeuro.mapping
converting CHINSIMP.TXT to build/mac_chinsimp.py and build/mac_chinsimp.mapping
Traceback (most recent call last):
File "gencodec.py", line 424, in <module>
convertdir(*sys.argv[1:])
File "gencodec.py", line 394, in convertdir
pymap(mappathname, map, dirprefix + codefile,name,comments)
File "gencodec.py", line 358, in pymap
code = codegen(name,map,encodingname,comments)
File "gencodec.py", line 271, in codegen
precisions=(4, 2))
File "gencodec.py", line 155, in python_mapdef_code
mappings = sorted(map.items())
TypeError: unorderable types: tuple() < int()
|
|||
| msg122586 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-27 23:03 | |
Please ignore Makefile changes in the patch. |
|||
| msg122829 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-29 16:57 | |
Martin, I believe you were the last to update the unicode database. (See r85371.) Did you use python2.x to generate it or you have your own private copy of these tools? I noticed that genwincodecs.bat refers to c:\python26\python in 2.7 branch and c:\python30\python in py3k. Could this be an indication that these tools are out of date? What is the plan for maintaining these tools? Should fixes be done in 2.7 and 3.x be generated by 2to3? Or should fixes go to py3k and backported to 2.7 when they don't add new features? |
|||
| msg122837 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-11-29 18:21 | |
gencodec.py is only rarely used, namely when adding new codecs based on Unicode mapping files. It is not run regularly on the files from ftp.unicode.org and only updated on demand. AFAIK, it was last used on Python2 and never on Python3, hence the errors you find with it. BTW: You appear to have a comma appended to the constant, that doesn't belong there: +# Placeholder for a missing codepoint +MISSING_CODE = -1, + Perhaps that's causing the second error you are seeing. |
|||
| msg122842 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-29 18:36 | |
On Mon, Nov 29, 2010 at 1:21 PM, Marc-Andre Lemburg <report@bugs.python.org> wrote: .. > BTW: You appear to have a comma appended to the constant, that doesn't > belong there: > > +# Placeholder for a missing codepoint > +MISSING_CODE = -1, > + > > Perhaps that's causing the second error you are seeing. No, that comma was a left-over from the attempt to fix the mac_chinsimp error. The trace that I reported was generated with MISSING_CODE = -1. I am replacing the patch. Is it ok to commit a partial fix? It may take longer to fix the mac error. |
|||
| msg122843 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2010-11-29 18:37 | |
Alexander Belopolsky wrote: > > Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: > > On Mon, Nov 29, 2010 at 1:21 PM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: > .. >> BTW: You appear to have a comma appended to the constant, that doesn't >> belong there: >> >> +# Placeholder for a missing codepoint >> +MISSING_CODE = -1, >> + >> >> Perhaps that's causing the second error you are seeing. > > No, that comma was a left-over from the attempt to fix the > mac_chinsimp error. The trace that I reported was generated with > MISSING_CODE = -1. I am replacing the patch. > > Is it ok to commit a partial fix? It may take longer to fix the mac error. Sure, we won't need that script anytime soon and if we do, we can just as well use the Python2 version. |
|||
| msg122850 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-29 18:52 | |
On Mon, Nov 29, 2010 at 1:38 PM, Marc-Andre Lemburg <report@bugs.python.org> wrote: .. > Sure, we won't need that script anytime soon and if we do, we > can just as well use the Python2 version. That may not be true. I compared 2.7 and py3k versions and the later has some new features: * unidata_version changed from 5.2.0 to 6.0.0 * Unihan data is read from zip file * added processing of DerivedCoreProperties These changes don't affect gencodec.py, but it may be inconvenient to run makeunicodedata.py and gencodec.py using different versions of Python. I'll check that all non-mac encodings are correctly generated before committing. |
|||
| msg122858 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2010-11-29 19:48 | |
> These changes don't affect gencodec.py, but it may be inconvenient to > run makeunicodedata.py and gencodec.py using different versions of > Python. As MAL explains: these are completely unrelated, independent tools, and gencodec isn't run more than once per decade (or so). I only ever run makeunicodedata, and I have been using Python 3 to run it. The mappings are not supposed to ever change once produced. In particular, new versions of Unicode cannot affect them, since the existing characters all map fine to existing code points, which will not change their meaning per Unicode stability criteria. |
|||
| msg122916 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-11-30 16:57 | |
Committed in revision 86891. Keeping open to address Mac issue. |
|||
| msg202543 - (view) | Author: A.M. Kuchling (akuchling) * ![]() |
Date: 2013-11-10 18:24 | |
For the Mac issue, we could just delete the mapping files before processing them. I've attached a patch that modifies the Makefile. |
|||
| msg233902 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2015-01-13 05:57 | |
Here is a new version of Kuchling’s patch. I restored some mapping files which do not give any errors (including the mac_turkish codec, which is actually documented), and removed both readme files. |
|||
| msg406955 - (view) | Author: Irit Katriel (iritkatriel) * ![]() |
Date: 2021-11-24 19:59 | |
I don't think Martin's patch has been applied. Is it needed? |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:09 | admin | set | github: 54761 |
| 2021-11-24 21:35:33 | vstinner | set | nosy:
- vstinner |
| 2021-11-24 19:59:51 | iritkatriel | set | nosy:
+ iritkatriel messages: + msg406955 |
| 2015-01-13 05:57:30 | martin.panter | set | files:
+ 10552-remove-apple-files-v2.txt versions: + Python 3.4 nosy: + martin.panter, vstinner messages: + msg233902 components: + Unicode |
| 2014-12-31 16:22:37 | akuchling | set | nosy:
- akuchling |
| 2014-06-29 23:08:51 | belopolsky | set | nosy:
+ ronaldoussoren, ned.deily, hynek |
| 2014-06-29 23:07:44 | belopolsky | set | assignee: belopolsky -> |
| 2013-11-10 18:24:50 | akuchling | set | files:
+ 10552-remove-apple-files.txt nosy: + akuchling messages: + msg202543 |
| 2010-12-30 22:14:16 | georg.brandl | unlink | issue7962 dependencies |
| 2010-11-30 16:57:48 | belopolsky | set | nosy:
lemburg, loewis, belopolsky, ezio.melotti messages: + msg122916 priority: normal -> low assignee: belopolsky components: + macOS stage: commit review -> needs patch |
| 2010-11-29 20:22:31 | belopolsky | unlink | issue10575 dependencies |
| 2010-11-29 19:48:38 | loewis | set | messages: + msg122858 |
| 2010-11-29 18:52:32 | belopolsky | set | messages: + msg122850 |
| 2010-11-29 18:37:58 | lemburg | set | messages: + msg122843 |
| 2010-11-29 18:36:58 | belopolsky | set | files: - issue10552a.diff |
| 2010-11-29 18:36:46 | belopolsky | set | files:
+ issue10552a.diff messages: + msg122842 |
| 2010-11-29 18:21:55 | lemburg | set | messages: + msg122837 |
| 2010-11-29 16:57:45 | belopolsky | set | messages: + msg122829 |
| 2010-11-29 16:45:33 | belopolsky | link | issue10575 dependencies |
| 2010-11-27 23:03:04 | belopolsky | set | messages: + msg122586 |
| 2010-11-27 23:02:25 | belopolsky | set | files:
+ issue10552a.diff messages:
+ msg122585 |
| 2010-11-27 22:16:02 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2010-11-27 22:09:48 | lemburg | set | messages: + msg122565 |
| 2010-11-27 21:15:09 | belopolsky | set | files:
+ issue10552.diff nosy:
+ loewis keywords: + patch |
| 2010-11-27 20:31:17 | belopolsky | link | issue7962 dependencies |
| 2010-11-27 20:29:09 | belopolsky | create | |
