String annotations [PEP 563]#4390

gvanrossum

~~This is unfinished work by @ambv.~~ UPDATE: this is ready for review now.

I'm adding it here because patching and reviewing are easier (for me anyway) when it's in PR form. Also, @serhiy-storchaka your eye would be appreciated, esp. for the hairy AST unparsing code in C. (Also, if you had to do this from scratch, would it be easier to unparse the CST instead?)

gvanrossum

I'm guessing there are still crasher bugs in here... E.g.

>>> from __future__ import string_annotations
>>> def f(a: list[str]): pass
... 
Segmentation fault: 11

serhiy-storchaka

There are crashes because the work is unfinished. Some parts still are not implemented (in particularly subscribing). Error checking is minimal if exists.

All concatenation has quadratic time. I think it is worth to implement simple accumulator that uses overallocated array and makes concatenations for linear time. Or you can reuse existing _PyBytesWriter or _PyUnicodeWriter.

Yes, maybe unparsing the CST could be much simpler. But we don't have feature flags at this stage.

serhiy-storchaka

If unparse the AST I would move the code into a separate file. There will be a lot of code, comparable with the size of compile.c.

ambv

Alright, I'm going to:

change the future name back to annotations (yay!)
stick to AST unparsing
move this work to a separate file
change concatenation to use the accumulator pattern (reusing _PyBytesWriter would be great)
finish the functionality

gvanrossum

Sounds great! I am pretty happy to see where this is going, I'd like to see it in a solid state by the time beta 1 comes around.

ambv

All comments from Serhiy's review acted upon, the import renamed back to "annotations" like commented above. The only missing piece in the implementation is f-string support but my battery will die soon so I wanted to push this out for you to look at.

Shouldn't segfault anymore, trying to use f-strings raises an exception instead.

ambv

Hm, the clang Travis CI build is failing due to invalid whitespace. make patchcheck is complaining about Include/code.h, suggesting the following diff:
https://gist.github.com/ambv/9f0874d56c7cf0d5ce98dfef38de0ce6

This diff is suggesting a lot of changes but not on the single line that I modified ¯\_(ツ)_/¯

ambv

I added a commit that fixes the whitespace according to the generated patch above so that I can see Travis CI passing. We can decide what to do with it later.

AppVeyor is failing because we need to modify PCbuild/* but I don't have access to a Windows box.

NEWS entry is not there yet since Blurb is tied to BPO issues and I'm wondering whether creating dummy issues for PEP work isn't redundant? I asked @larryhastings, we can also deal with this later.

ambv

Changes:

Rebased on top of latest master
Added a NEWS entry (manually since this is a PEP; the bot doesn't recognize this, therefore the skip news label has to stay)
Moved the changes to PCbuild to the respective implementation commit ("Implement unparsing...")

This is ready for another review pass, @serhiy-storchaka. The only bit left is f-strings which is going to be a bit tedious so I'm waiting with it after a new round of feedback :-)

ilevkivskyi

How, do we organize the updates to typing.get_type_hints() to work with "doubly quoted" strings? I mean this should still work with the __future__ import:

def f() -> List['int']:
    ...

assert get_type_hints(f)['return'] == List[int]

I suppose this can be part of the same PR, since this is not necessary in the backported version on PyPI. Also my updates to typing following PEP 560 should not have merge conflicts.

ambv

@ilevkivskyi, I want to fix get_type_hints() separately since there's really no reason why accepting "List['int']" shouldn't be supported even today. Same with fixing the self-class reference as you suggested on python-dev (I think I'd do it with a ChainMap though), fixing the forward ref cache conflicts, etc.

ilevkivskyi

@ambv OK, I am fine with this as well.

serhiy-storchaka

It will take a time for making a review of such large change. But one comment I can say now.

The unparser adds parenthesis for grouping subexpression. They are added even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is not that redundant parenthesis makes an expression less readable. The problem is that they increase the stack consumption when parse the expression again. It is possible that the original expression can be parsed, but parsing the unparsed expression will fail or even crash.

I already encountered with similar problem when worked on the parser of plural form expressions in gettext.py. A C-like syntax is parsed and converted to Python syntax, and the result is evaluated. I minimized the use of parenthesis. If the subexpression operator has higher priority than the operator of the outer expression, parenthesis are not added.

This is not a blocker, and we can solve this problem later, but you can think about this while I'm making a review.

gvanrossum

Good observation. Also, mypy doesn't like redundant parentheses in type expressions. (Though it won't ever encounter these, since it parses the source, so maybe it doesn't matter.)

…

On Wed, Nov 22, 2017 at 11:32 AM, Serhiy Storchaka ***@***.*** > wrote: It will take a time for making a review of such large change. But one comment I can say now. The unparser adds parenthesis for grouping subexpression. They are added even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is not that redundant parenthesis makes an expression less readable. The problem is that they increase the stack consumption when parse the expression again. It is possible that the original expression can be parsed, but parsing the unparsed expression will fail or even crash. I already encountered with similar problem when worked on the parser of plural form expressions in gettext.py. A C-like syntax is parsed and converted to Python syntax, and the result is evaluated. I minimized the use of parenthesis. If the subexpression operator has higher priority than the operator of the outer expression, parenthesis are not added. This is not a blocker, and we can solve this problem later, but you can think about this while I'm making a review. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#4390 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACwrMqGgQ0gZQAtXZIJpySJuO8Q7_UV0ks5s5HbagaJpZM4Qco6U> .

-- --Guido van Rossum (python.org/~guido)

ambv

ast_unparse.c is a close translation of the relevant parts of Tools/unparse.py. I didn't want to create an entire new thing from scratch as I'd miss edge cases that way for sure. Tools/unparse.py uses parens liberally as this is the simplest way to ensure the order of operations is preserved. To know if it's safe to omit a paren, a sub-expression would need to know where it's being emitted (e.g. the super-expression). That's way more complicated than what is already done. So, Tools/unparse.py makes no effort to omit parens when they aren't needed. This, as Guido points out, produces types that mypy is unable to parse (like "Dict[(str, int)]"). That won't fly for us so my C implementation allows for omitting parens under certain circumstances already (like the tuple index in the previous example). There are many tests around this. The goal was to not put any spurious parens in typical expressions used in typing. I didn't focus on minimizing parens in expressions which aren't valid types per PEP 484. Two enhancements that are definitely possible: - math operation ordering; and - omitting comma-catching parens if there is no comma in the inner expression (like in comprehensions, dict literals, etc.) I'll look into this next week. I am worried that such cleanup effort is likely to lead to bugs (expressions that end up semantically different from their original form). A spurious pair of parens is way less harmful than that. Speaking of harm, Serhiy, do you have a legit example where we can get to a stack overflow due to spurious parens? While this is theoretically possible, I think we'd have to maliciously create an expression pathological enough to trigger this condition. Any other parse errors or crashes that spurious parens induce? Serhiy, also, thank you for spending your time on reviewing this. I appreciate it. Your first round of review was already super helpful! Let me know if I can do anything to make review easier for you.

…

-- Best regards, Łukasz Langa

On Nov 22, 2017, at 11:37 AM, Guido van Rossum ***@***.***> wrote: Good observation. Also, mypy doesn't like redundant parentheses in type expressions. (Though it won't ever encounter these, since it parses the source, so maybe it doesn't matter.) On Wed, Nov 22, 2017 at 11:32 AM, Serhiy Storchaka ***@***.*** > wrote: > It will take a time for making a review of such large change. But one > comment I can say now. > > The unparser adds parenthesis for grouping subexpression. They are added > even if not strictly needed, e.g. in a + (b * (c ** d)). The problem is > not that redundant parenthesis makes an expression less readable. The > problem is that they increase the stack consumption when parse the > expression again. It is possible that the original expression can be > parsed, but parsing the unparsed expression will fail or even crash. > > I already encountered with similar problem when worked on the parser of > plural form expressions in gettext.py. A C-like syntax is parsed and > converted to Python syntax, and the result is evaluated. I minimized the > use of parenthesis. If the subexpression operator has higher priority than > the operator of the outer expression, parenthesis are not added. > > This is not a blocker, and we can solve this problem later, but you can > think about this while I'm making a review. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#4390 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ACwrMqGgQ0gZQAtXZIJpySJuO8Q7_UV0ks5s5HbagaJpZM4Qco6U> > . > -- --Guido van Rossum (python.org/~guido) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

serhiy-storchaka

Yes, in case of gettext the one of purposes was to guard against a malicious input. In the case of annotations it is less likely. The other purpose -- speeding up the compilation.

This problem can be solved by assigning the numerical priority level to expressions and omitting parenthesis only if the current priority level is higher then the level of a super-expression (the sub-expression rather of a super-expression is responsible for adding parenthesis).

I have yet two questions.

Is the performance important here (I afraid yes)? The Python implementation would be simpler and more reliable. But much slower.
Do we need to support the full Python expression syntax? In particular arithmetic operations. Or it would be enough to support only the small subset used in type annotations? Names, attributes, indexing, tuples, what more?

gvanrossum

1. The string will be embedded in code objects. I don't want to have to call out to Python code when creating code objects. 2. Yes it should support anything that's currently legal in an expression, not just what we think are valid type expressions. Otherwise it's not backwards compatible. (Also, otherwise the whole thing about string literals was bogus.)

serhiy-storchaka

Added comments are mostly style comments (PEP 7) and suggestions for cleaning up the code, but there are several errors.

gvanrossum

The readability issue will occasionally come up when people are starting to debug annotations. There's also the hypothetical issue that if someone were to generate stub files based on these, the redundant parentheses may elicit complaints from either mypy or pytype, both of which employ a limited syntax for annotations. That said we can always improve this later, so prioritize as you see fit.

ambv

As I said in my comment on Nov 23rd, to the best of my knowledge, the current state of the diff already omits all cases of spurious parens that occur in valid type annotations.

serhiy-storchaka

Don't spend your time on fighting with the extra parens if this distracts you from more prioritized tasks. If you don't solve this problem in your PR I'm going to do this after its merging. I suppose this will not add too much complexity. Your code already avoid producing the extra parens in many cases. This is enough for the initial implementation.

Yet one consideration. Could it help if introduce macros like the VISIT macro in symtable.c and compile.c? They should call specified function with explicit and implicit arguments, check a result and return a failure if it is failed. Most functions could be just a short sequence of invocations of these macros. This technique is used in many places in CPython sources.

ambv

Could it help if introduce macros like the VISIT macro in symtable.c and compile.c?

I was thinking about this when I was originally writing this but wasn't sure if macros aren't reserved just for special usage so I avoided them. If you'd like, I can refactor the file to use a macro instead, you're right, that should shorten it quite a bit.

The string form is recovered by unparsing the AST.

This is required for PEP 563 and as such only implements a part of the unparsing process that covers expressions.

ambv

Alright, this is fully rebased without conflicts and all comments from previous code review are addressed. Things left to do:

remove special handling of strings
add support for f-strings
(maybe?) refactor using a VISIT-style macro

See: https://www.python.org/dev/peps/pep-0563/#passing-string-literals-in-annotations-verbatim-to-annotations

ambv

Special handling of strings removed. I plan to add the missing f-string support in the first week of January so that the implementation is hopefully mergeable in 3.7.0a4.

ambv

Alright, @serhiy-storchaka, this is complete now, including f-string support! I realize it's pretty last minute, sorry for that.

A piece of useless statistics: this pull request was implemented in full during intercontinental flights. There's something very tranquil about sitting in one place for 10+ hours with no distractions.

1st1

Overall looks good. Code in Python/ast_unparse.c looks fine, I didn't see any refleaks or non-checked return values. I think we can go ahead with this one and merge it, we'll have plenty of time to catch any bugs during the beta/rc period.

bedevere-bot added the awaiting merge label Nov 14, 2017

the-knights-who-say-ni added the CLA signed label Nov 14, 2017

serhiy-storchaka reviewed Nov 14, 2017

View reviewed changes

ambv force-pushed the string_annotations branch from b49dfa1 to c56d5bd Compare November 18, 2017 22:48

ambv changed the title ~~[WIP] String annotations~~ Nov 18, 2017

ambv self-assigned this Nov 18, 2017

ambv added the skip issue label Nov 18, 2017

ambv force-pushed the string_annotations branch from c56d5bd to 1e71626 Compare November 18, 2017 22:57

gvanrossum requested a review from a team as a code owner November 18, 2017 23:28

ambv added the skip news label Nov 18, 2017

ambv force-pushed the string_annotations branch from 4cdb25d to 98231af Compare November 20, 2017 19:39

ambv added skip news and removed skip news labels Nov 20, 2017

serhiy-storchaka reviewed Nov 27, 2017

View reviewed changes

ambv and others added 4 commits December 30, 2017 21:29

Document from __future__ import annotations (PEP 563)

1a87ea0

Plumbing for from __future__ import annotations (PEP 563) …

863d3e9

The string form is recovered by unparsing the AST.

Implement unparsing the AST back to string form …

bc3ad37

This is required for PEP 563 and as such only implements a part of the unparsing process that covers expressions.

Response to Serhiy's review

7f88115

ambv force-pushed the string_annotations branch from 98231af to 7f88115 Compare December 31, 2017 08:41

Strings are no longer treated special …

08f8600

See: https://www.python.org/dev/peps/pep-0563/#passing-string-literals-in-annotations-verbatim-to-annotations

f-string support

de9e5b0

auvipy approved these changes Jan 23, 2018

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting merge labels Jan 23, 2018

emilyemorehouse reviewed Jan 25, 2018

View reviewed changes

1st1 approved these changes Jan 25, 2018

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jan 25, 2018

compatibility typo

26c028c

ambv merged commit 95e4d58 into python:master Jan 26, 2018

bedevere-bot removed the awaiting merge label Jan 26, 2018

ts826848 mentioned this pull request Nov 1, 2018

Code using postponed evaluation of annotations (PEP 563) is rejected with a SyntaxError Nuitka/Nuitka#188

Closed

zkrolikowski-vl mentioned this pull request Jan 25, 2021

Consider using typing extensions for older Python versions VirtusLab/pandas-stubs#7

Closed

guevara mentioned this pull request Jun 18, 2021

为什么 Python 的 Type Hint 没有流行起来 guevara/read-it-later#7969

Open

ambv deleted the string_annotations branch July 12, 2021 11:24

MikeWallis42 mentioned this pull request Feb 14, 2023

fixes #141 developer typing for dbt_args astronomer/astronomer-cosmos#142

Closed

This comment was marked as spam.

Sign in to view

Conversation

gvanrossum commented Nov 14, 2017 • edited by ambv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Nov 14, 2017

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Nov 14, 2017

Uh oh!

ambv commented Nov 18, 2017

Uh oh!

gvanrossum commented Nov 18, 2017

Uh oh!

ambv commented Nov 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ambv commented Nov 18, 2017

Uh oh!

ambv commented Nov 18, 2017

Uh oh!

ambv commented Nov 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilevkivskyi commented Nov 22, 2017

Uh oh!

ambv commented Nov 22, 2017

Uh oh!

ilevkivskyi commented Nov 22, 2017

Uh oh!

serhiy-storchaka commented Nov 22, 2017

Uh oh!

gvanrossum commented Nov 22, 2017 via email

Uh oh!

ambv commented Nov 23, 2017 via email

Uh oh!

serhiy-storchaka commented Nov 23, 2017

Uh oh!

gvanrossum commented Nov 23, 2017 via email

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

gvanrossum commented Nov 27, 2017 via email

Uh oh!

ambv commented Nov 27, 2017

Uh oh!

serhiy-storchaka commented Nov 27, 2017

Uh oh!

ambv commented Nov 27, 2017

Uh oh!

ambv commented Dec 31, 2017

Uh oh!

ambv commented Dec 31, 2017

Uh oh!

ambv commented Jan 16, 2018

Uh oh!

1st1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as spam.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

gvanrossum commented Nov 14, 2017 •

edited by ambv

Loading

ambv commented Nov 18, 2017 •

edited

Loading

ambv commented Nov 20, 2017 •

edited

Loading

1st1 left a comment •

edited

Loading