Issue 10785: parser: store the filename as an unicode object
Created on 2010-12-28 02:40 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| parser_filename_obj-3.patch | vstinner, 2011-01-05 04:26 | |||
| Messages (9) | |||
|---|---|---|---|
| msg124755 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-12-28 02:40 | |
The Python parser stores the filename as a byte string. But it decodes the filename on error because most Python functions now use unicode strings. Instead of decoding the filename at error, which may raise a new error, I propose to decode the filename on the creation of the parser object and only store the filename as unicode. This issue would prepare the last part of the full unicode support (#3080). |
|||
| msg124823 - (view) | Author: Alexander Belopolsky (belopolsky) * ![]() |
Date: 2010-12-28 22:14 | |
I like the idea, but I don't like the trend that parser code continues to diverge from pgen. I understand that most of the Python runtime is not available to pgen, but maybe a more elegant solution than changing the type conditional on PGEN can be found. For example, maybe filename could be decoded from FS encoding to UTF-8? |
|||
| msg124826 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-12-28 23:15 | |
> maybe a more elegant solution than changing the type conditional > on PGEN can be found In pgen, the filename is only used to display the following warning, in indenterror(): <filename>: inconsistent use of tabs and spaces in indentation In pratical, this warning never occurs on Grammar/Grammar: this file doesn't use indentation at all, only continuation lines. A better solution is maybe just to drop the filename for pgen. Anyway, pgen only compiles *one* file (Grammar/Grammar), so we don't need the input filename ;-) |
|||
| msg124827 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-12-28 23:16 | |
When testing my patch, I found and fixed two bugs in pgen: - r87557: PGEN was not defined to compile pgenmain.c and printgrammar.c - r87558: pgen error was ignored on "make Parser/pgen.stamp" (when executing pgen to compile the grammar) |
|||
| msg124828 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2010-12-28 23:32 | |
Version 2 of the patch: - remove filename attribute from perrdetail and tok_state structure in PGEN mode, and add a comment to explain why - rename filename_obj to filename - indenterror() no longer print the input filename in PGEN mode |
|||
| msg125302 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-01-04 11:02 | |
err_clear() should set err->filename to NULL. |
|||
| msg125409 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-01-05 04:26 | |
Version 3 of the patch to fix also #9319. |
|||
| msg130937 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-03-15 01:02 | |
@Benjamin: You told me that you don't want two versions of pgen, but I don't remember why. As my work on #3080 is mostly done, I now plan to patch the Python parser to store the filename as Unicode. So could you please review the patch attached to this issue? |
|||
| msg132990 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2011-04-04 23:48 | |
New changeset 6e9dc970ac0e by Victor Stinner in branch 'default': Issue #10785: Store the filename as Unicode in the Python parser. http://hg.python.org/cpython/rev/6e9dc970ac0e |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:10 | admin | set | github: 54994 |
| 2011-04-04 23:56:24 | vstinner | set | status: open -> closed resolution: fixed |
| 2011-04-04 23:48:20 | python-dev | set | nosy:
+ python-dev messages: + msg132990 |
| 2011-03-15 01:02:57 | vstinner | set | nosy:
belopolsky, vstinner, benjamin.peterson messages: + msg130937 |
| 2011-01-06 13:03:32 | pitrou | set | nosy:
+ benjamin.peterson |
| 2011-01-05 04:26:52 | vstinner | set | files:
- parser_filename_obj-2.patch nosy: belopolsky, vstinner |
| 2011-01-05 04:26:50 | vstinner | set | files:
- parser_filename_obj.patch nosy: belopolsky, vstinner |
| 2011-01-05 04:26:45 | vstinner | set | files:
+ parser_filename_obj-3.patch nosy: belopolsky, vstinner messages: + msg125409 |
| 2011-01-04 11:02:42 | vstinner | set | nosy:
belopolsky, vstinner messages: + msg125302 versions: - Python 3.2 |
| 2010-12-28 23:32:43 | vstinner | set | files:
+ parser_filename_obj-2.patch nosy: belopolsky, vstinner messages: + msg124828 |
| 2010-12-28 23:16:39 | vstinner | set | nosy:
belopolsky, vstinner messages: + msg124827 |
| 2010-12-28 23:15:11 | vstinner | set | nosy:
belopolsky, vstinner messages: + msg124826 |
| 2010-12-28 22:14:19 | belopolsky | set | nosy:
+ belopolsky messages: + msg124823 |
| 2010-12-28 02:50:16 | vstinner | set | files: + parser_filename_obj.patch |
| 2010-12-28 02:49:34 | vstinner | set | files: - parse_filename_obj.patch |
| 2010-12-28 02:40:20 | vstinner | create | |

