◐ Shell
reader mode source ↗
Skip to content

bpo-33338: [lib2to3] Synchronize token.py and tokenize.py with stdlib#6572

Closed
ambv wants to merge 3 commits into
python:mainfrom
ambv:tokenizeSync
Closed

bpo-33338: [lib2to3] Synchronize token.py and tokenize.py with stdlib#6572
ambv wants to merge 3 commits into
python:mainfrom
ambv:tokenizeSync

Conversation

@ambv

@ambv ambv commented Apr 23, 2018

Copy link
Copy Markdown
Contributor

(This is Step 1 in BPO-33337. See there for larger context.)

lib2to3's token.py and tokenize.py were initially copies of the respective
files from the standard library. They were copied to allow Python 3 to read
Python 2's grammar.

Since 2006, lib2to3 grew to be widely used as a Concrete Syntax Tree, also for
parsing Python 3 code. Additions to support Python 3 grammar were added but
sadly, the main token.py and tokenize.py diverged.

This change brings them back together, minimizing the differences to the bare
minimum that is in fact required by lib2to3. Before this change, almost every
line in lib2to3/pgen2/tokenize.py was different from tokenize.py. After this
change, the diff between the two files is only 200 lines long and is entirely
filled with relevant Python 2 compatibility bits.

Merging the implementations, there's numerous fixes to the lib2to3 tokenizer:

  • docstrings made as similar as possible
  • ported TokenInfo
  • ported tokenize.tokenize() and tokenize.open()
  • removed Python 2-only implementation cruft
  • made Unicode identifier handling the same
  • made string prefix handling the same
  • added Ellipsis to the Special group
  • Untokenizer backported bugfixes:
  • detect_encoding tries to figure out a filename and find_cookie uses
    the filename in error messages, if available
  • find_cookie bugfix: BPO-14990
  • BPO-16152: tokenizer doesn't crash on missing newline at the end of the
    stream (added \Z (end of string) to PseudoExtras)

Improvements to token.py:

  • taken from the current Lib/token.py
  • tokens renumbered to match Lib/token.py
  • __all__ properly defined
  • ASYNC, AWAIT and BACKQUOTE exist under different numbers (100 + old number)
  • ELLIPSIS added
  • ENCODING added

https://bugs.python.org/issue33338

lib2to3's token.py and tokenize.py were initially copies of the respective
files from the standard library.  They were copied to allow Python 3 to read
Python 2's grammar.

Since 2006, lib2to3 grew to be widely used as a Concrete Syntax Tree, also for
parsing Python 3 code.  Additions to support Python 3 grammar were added but
sadly, the main token.py and tokenize.py diverged.

This change brings them back together, minimizing the differences to the bare
minimum that is in fact required by lib2to3.  Before this change, almost every
line in lib2to3/pgen2/tokenize.py was different from tokenize.py.  After this
change, the diff between the two files is only 200 lines long and is entirely
filled with relevant Python 2 compatibility bits.

Merging the implementations, there's numerous fixes to the lib2to3 tokenizer:

+ docstrings made as similar as possible
+ ported `TokenInfo`
+ ported `tokenize.tokenize()` and `tokenize.open()`
+ removed Python 2-only implementation cruft
+ made Unicode identifier handling the same
+ made string prefix handling the same
+ added Ellipsis to the Special group
+ Untokenizer backported bugfixes:
  - 5e6db31
  - 9dc3a36
  - 5b8d2c3
  - e411b66
  - BPO-2495
+ `detect_encoding` tries to figure out a filename and `find_cookie` uses
  the filename in error messages, if available
+ `find_cookie` bugfix: BPO-14990
+ BPO-16152: tokenizer doesn't crash on missing newline at the end of the
  stream (added \Z (end of string) to PseudoExtras)

Improvements to token.py:

+ taken from the current Lib/token.py
+ tokens renumbered to match Lib/token.py
+ `__all__` properly defined
+ ASYNC, AWAIT and BACKQUOTE exist under different numbers (100 + old number)
+ ELLIPSIS added
+ ENCODING added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants