bpo-40334: Refactor peg_generator to receive a Tokens file when building c code by pablogsal · Pull Request #19745 · python/cpython
https://bugs.python.org/issue40334
This PR does the following:
- Fix a bunch of (very minor) mypy stuff that was missing.
- Separate the C parser and the Python parser in pegen main (because both receive different arguments). Thread down all these changes to the generator build module.
- Add a new option to the C parser command line to receive the
Tokensfile. - Thread down the
Tokensfile and add code to parse it and calculate the required token information. - Use the new tokens in the c_generator (and simplify some code that was hardcoding some token names).
- Update the build files (Makefile and the Windows one) to use the new option.
- Run black over the source.
This is how the command line looks now with the sub-parsers:
Main CL
~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen
usage: pegen [-h] [-q] [-v] [--skip-actions] {c,python} ...
Experimental PEG-like parser generator
positional arguments:
{c,python} target language for the generated code
c Generate C code for inclusion into CPython
python Generate Python code
optional arguments:
-h, --help show this help message and exit
-q, --quiet Don't print the parsed grammar
-v, --verbose Print timing stats; repeat for more debug output
C subparser
~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen c -h
usage: pegen c [-h] [--compile-extension] [-o OUT] [--optimized] [--skip-actions] grammar_filename tokens_filename
positional arguments:
grammar_filename Grammar description
tokens_filename Tokens description
optional arguments:
-h, --help show this help message and exit
-o OUT, --output OUT Where to write the generated parser
--compile-extension Compile generated C code into an extension module
--optimized Compile the extension in optimized mode
--skip-actions Suppress code emission for rule actions
Python subparser
~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen python -h
usage: pegen python [-h] [-o OUT] grammar_filename
positional arguments:
grammar_filename Grammar description
optional arguments:
-h, --help show this help message and exit
-o OUT, --output OUT Where to write the generated parser
--skip-actions Suppress code emission for rule actions
Comment on lines +64 to 66
| if name in self.non_exact_tokens: | ||
| name = name.lower() | ||
| return f"{name}_var", f"_PyPegen_{name}_token(p)" |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should special-case NAME here and call _PyPegen_name_token for it and then call _PyPegen_expect_token for all the others (which is what the "named" functions already do anyway). This way we could get rid of these "named" expect functions (e.g. _PyPegen_indent_token).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think I still like to have the abstraction in case we need to intercede something like with the NAME token in the future (this also allows us to not have "hardcoded" names in the generator) but if you think is better to re-add that in the future and eliminate the wrappers I can totally do it as I don't feel strongly about it.
What do you think?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still like to have the abstraction in case we need to intercede something like with the NAME token in the future
It feels, that that's only the case with STRINGs. I don't really expect to do anything more complicated that just returning the token with all the whitespace ones (INDENT, DEDENT etc.) and I'd re-add those functions for async and await, if it were ever needed.
With that said, I really don't feel strongly about either one, so it's your call! 😃
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am going to merge this and we can revisit this in further refactors of this code. I have some improvements in mind to avoid checking for sub-strings (like var.startswith("name") when distinguishing the types and we can explore doing this in that PR. 😉
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with your suggestion and the main blocker is that _PyPegen_lookahead is called with these functions and it imposes that they may take the parser as the only argument and there is not a simple way to make _PyPegen_lookahead allow to forward the arguments :(
Edit: I am continuing experimenting with this idea....
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think of this problem. Thanks a lot for doing the investigation!
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
Hi! The buildbot AMD64 Fedora Stable 3.x has failed when building commit 5b9f498.
Unrelated failure:
1 test failed:
test_concurrent_futures