Issue 46555: Unicode-mangled names refer inconsistently to constants
Created on 2022-01-27 21:57 by Kodiologist, last changed 2022-04-11 14:59 by admin.
| Messages (8) | |||
|---|---|---|---|
| msg411930 - (view) | Author: (Kodiologist) * | Date: 2022-01-27 21:57 | |
I'm not sure if this is a bug, but it certainly surprised me. Most reserved words, when Unicode-mangled, as in "๐๐๐", act like ordinary identifiers (see e.g. bpo-46520). `True`, `False`, and `None` are weird in that Unicode-mangled versions of them refer to those same constants initially, but can take on their own identity as variables if assigned to: Python 3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> ๐๐ฃ๐ฆ๐ True >>> True = 0 File "<stdin>", line 1 True = 0 ^ SyntaxError: cannot assign to True >>> ๐๐ฃ๐ฆ๐ = 0 >>> True True >>> ๐๐ฃ๐ฆ๐ 0 I think that `๐๐ฃ๐ฆ๐ = 1` should probably be forbidden. The fact that `๐๐ฃ๐ฆ๐` doesn't always mean the same thing as `True` seems to break the rule in PEP 3131 that "comparison of identifiers is based on NFKC". |
|||
| msg412070 - (view) | Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * | Date: 2022-01-29 11:42 | |
hah, this is "great":
>>> ๐๐ฃ๐ฆ๐ = 1
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'True': 1}
The problem is that the lexer assumes that anything that is not ASCII cannot be a keyword and lexes ๐๐ฃ๐ฆ๐ as an identifier.
|
|||
| msg412071 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2022-01-29 11:53 | |
True is a keyword which is compiled to expression whose value is True, ๐๐ฃ๐ฆ๐ is an identifier which refers to the builtin variable "True" which has a value True by default. You can change the value of a builtin variable, but the value of expression True is always True. I do not see a problem here. Don't use ๐๐ฃ๐ฆ๐ if your intention is not using a variable. |
|||
| msg412150 - (view) | Author: (Kodiologist) * | Date: 2022-01-30 14:47 | |
> the builtin variable "True" Is the existence of this entity, as separate from the constant `True`, documented anywhere? constants.rst doesn't seem to acknowledge it. Indeed, is its existence a feature, or is it a CPython quirk? |
|||
| msg412167 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2022-01-30 18:15 | |
https://docs.python.org/3/library/constants.html#built-in-constants |
|||
| msg412169 - (view) | Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * | Date: 2022-01-30 18:58 | |
Ok, I can definitely agree with Serhiy pov: "True" is a keyword that always evaluates to the object that you get when you call bool(1). There is usually no name "True" and directly assigning to it is forbidden. But there are various other ways to assign a name "True". One is eg globals("True") = 5, another one (discussed in this issue) is using identifiers that NFKC-normalize to the string "True".
|
|||
| msg412170 - (view) | Author: Eryk Sun (eryksun) * ![]() |
Date: 2022-01-30 19:09 | |
Why was it decided to not raise a syntax error when the NFKC normalization of a non-ASCII token matches a keyword? I don't see a use for cases such as `๐๐ = 1` and `๐๐ + 1`. It seems the cost in terms of confusion far outweighs any potential benefit. |
|||
| msg412226 - (view) | Author: James Gerity (SnoopJeDi) | Date: 2022-02-01 00:41 | |
> Why was it decided to not raise a syntax error... I'm not sure if such a decision was even ever made, the error happens before normalization is applied. I.e. the parser is doing two things here: (1) validating the syntax against the grammar and (2) building the AST. Normalization happens after (1), and `๐๐ฃ๐ฆ๐ = 0` is valid syntax because the grammar is NOT defined in terms of normalized identifiers, it's describing the valid (but confusing!) assignment that Carl described. I agree that this doesn't seem like bug, but it IS my new favorite quirk of identifier normalization. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:55 | admin | set | github: 90713 |
| 2022-02-01 00:41:18 | SnoopJeDi | set | messages: + msg412226 |
| 2022-01-30 19:09:28 | eryksun | set | nosy:
+ eryksun messages: + msg412170 |
| 2022-01-30 18:58:53 | Carl.Friedrich.Bolz | set | messages: + msg412169 |
| 2022-01-30 18:15:52 | serhiy.storchaka | set | messages: + msg412167 |
| 2022-01-30 14:47:35 | Kodiologist | set | messages: + msg412150 |
| 2022-01-29 17:56:53 | jack1142 | set | nosy:
+ jack1142 |
| 2022-01-29 11:53:33 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg412071 |
| 2022-01-29 11:42:21 | Carl.Friedrich.Bolz | set | nosy:
+ Carl.Friedrich.Bolz messages: + msg412070 |
| 2022-01-29 03:39:07 | SnoopJeDi | set | nosy:
+ SnoopJeDi |
| 2022-01-27 21:57:22 | Kodiologist | create | |

