◐ Shell
reader mode source ↗
Skip to content

Implement more ast features#6986

Merged
youknowone merged 10 commits into
RustPython:mainfrom
youknowone:ast
Feb 5, 2026
Merged

Implement more ast features#6986
youknowone merged 10 commits into
RustPython:mainfrom
youknowone:ast

Conversation

@youknowone

@youknowone youknowone commented Feb 3, 2026

Copy link
Copy Markdown
Member

Summary by CodeRabbit

  • New Features

    • Enhanced AST module with support for function type nodes and improved validation
    • Added feature version support to the compile() function for better Python compatibility
    • Improved documentation coverage for Python's internal modules and objects
  • Bug Fixes

    • Fixed float integer detection to use exact checks instead of epsilon-based comparisons
    • Enhanced string handling for robustness against invalid character codes
  • Documentation

    • Expanded built-in module documentation with comprehensive entries for internal Python classes and functions

@coderabbitai

coderabbitai Bot commented Feb 3, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR refactors the AST system to add comprehensive validation and representation helpers, changes how builtins are stored from PyDictRef to PyObjectRef throughout the frame and function system, expands documentation entries, and enhances string handling for f-strings and template strings with normalization and escape-sequence warnings.

Changes

Cohort / File(s) Summary
Builtins Type System
crates/vm/src/builtins/frame.rs, crates/vm/src/frame.rs, crates/vm/src/builtins/function.rs, crates/vm/src/vm/mod.rs, crates/vm/src/suggestion.rs
Changed builtins storage from PyDictRef to PyObjectRef in Frame and ExecutingFrame; updated PyFunction construction to use frame's builtins directly; modified run_code_obj to dynamically resolve builtins from module.builtins; adjusted load_global_or_builtin to handle both dict and non-dict builtins with fallback paths.
AST Core Implementation
crates/vm/src/stdlib/ast/pyast.rs, crates/vm/src/stdlib/ast/python.rs, crates/vm/src/stdlib/ast/validate.rs, crates/vm/src/stdlib/ast/repr.rs
Introduced impl_base_node! macro for AST node declaration; added comprehensive AST validation module; implemented AST representation helpers with depth-aware truncation; refactored extend_module_nodes to register AST class and populate field types, singletons, and match args; enhanced NodeAst with reduce, replace, and custom initialization logic.
AST Node Type Handling
crates/vm/src/stdlib/ast/basic.rs, crates/vm/src/stdlib/ast/constant.rs, crates/vm/src/stdlib/ast/elif_else_clause.rs, crates/vm/src/stdlib/ast/expression.rs, crates/vm/src/stdlib/ast/module.rs, crates/vm/src/stdlib/ast/operator.rs, crates/vm/src/stdlib/ast/other.rs
Updated string handling to use interned strings in basic.rs and other.rs; reworked frozenset constant representation; adjusted elif/else range computation and clause construction; enhanced expression handling with NodeExprTemplateStr support and dict key/value validation; removed node location additions in module nodes; added _instance early-return paths in operator nodes.
AST Parameters & Patterns
crates/vm/src/stdlib/ast/parameter.rs, crates/vm/src/stdlib/ast/pattern.rs
Updated merge_positional/keyword_parameter_defaults to accept VM, validate argument counts, and return PyResult; enhanced pattern matching to explicitly type kwd attributes/patterns and validate length consistency between kwd_attrs and kwd_patterns.
AST String & Statement Handling
crates/vm/src/stdlib/ast/string.rs, crates/vm/src/stdlib/ast/statement.rs
Introduced f-string/template-string normalization with adjacent constant merging; added escape-sequence validation and warnings; changed TStringInterpolation.format_spec from TemplateStr to JoinedStr; enhanced function/class definition range computation; improved ImportFrom level validation and Pass statement location handling.
Compile & Parser Integration
crates/vm/src/stdlib/ast.rs, crates/vm/src/stdlib/builtins.rs
Expanded parse entry points with optimize, target_version, and type_comments parameters; added ModFunctionType exposure; introduced parse_func_type and wrap_interactive under parser feature; added AST validation in compile(); enhanced error propagation for parse errors; exposed new compile flags (PY_CF_SOURCE_IS_UTF8, PY_CF_IGNORE_COOKIE, PY_CF_ALLOW_TOP_LEVEL_AWAIT).
Documentation & Spelling
.cspell.dict/cpython.txt, crates/doc/generate.py, crates/doc/src/data.inc.rs
Added binop and hexdigit to spelling dictionary; updated doc generation to use strip() instead of inspect.cleandoc(); improved is_child_of module resolution for C modules; massively expanded DB documentation map with Python internal modules, builtin types, and class descriptions.
Numeric & Type Handling
crates/literal/src/float.rs, crates/vm/src/builtins/type.rs
Changed is_integer check in float.rs from epsilon-based tolerance to exact finite check (v.is_finite() && v.fract() == 0.0); refactored PyType.annotations resolution with stricter HEAPTYPE guards and non-dict attribute skipping.
Utility & Platform-Specific
crates/vm/src/stdlib/ctypes/function.rs, crates/vm/src/stdlib/ctypes/simple.rs, crates/vm/src/stdlib/os.rs, extra_tests/snippets/stdlib_types.py
Replaced direct wchar-to-char casts with guarded u32::try_from conversions in ctypes; fixed OS encoding detection to use byte constants; updated test snippet to add Pass statement and remove explicit location metadata.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant VM
    participant Frame
    participant Module
    participant Builtins

    Client->>VM: run_code_obj(code_obj, scope)
    VM->>Module: resolve builtins from __builtins__
    alt Module has PyModule __builtins__
        Module->>Builtins: dict()
    else Module has direct builtins
        Module->>Builtins: use as-is
    else No __builtins__
        VM->>Builtins: use vm.builtins.dict()
    end
    VM->>Frame: new(builtins: PyObjectRef)
    Frame->>Client: Frame ready (builtins: PyObjectRef)
    Client->>Frame: load_global_or_builtin(name)
    alt builtins is dict
        Frame->>Builtins: fast path dict lookup
    else builtins is non-dict
        Frame->>Builtins: generic path getattr
    end
    Builtins->>Client: value or error
Loading
sequenceDiagram
    participant Compiler
    participant Parser
    participant Validator
    participant AST_Module

    Compiler->>Parser: parse(source, mode, feature_version)
    Parser->>Parser: apply target_version defaults
    Parser->>AST_Module: emit AST nodes
    alt mode == "single"
        Validator->>AST_Module: wrap_interactive(Module → Interactive)
    end
    Validator->>Validator: validate_mod(module)
    Validator->>Validator: check context consistency, body non-empty, patterns
    alt validation fails
        Validator->>Compiler: PyResult error
    else validation succeeds
        Validator->>Compiler: validated AST
    end
    Compiler->>Compiler: compile to code object
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

🐰 Builtins dance from dict to object, so flexible and free,
AST nodes validate their forms, a symphony of three,
F-strings merge their secrets, escape sequences show their way,
A rabbit hops through Frame's new paths, bringing order to the fray! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.72% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'Implement more ast features' is generic and vague, failing to convey specific details about the substantial changes made across the AST module and related components. Consider using a more specific title that highlights the primary changes, such as 'Add Python AST validation, interning, and builtins refactoring' or similar to better reflect the scope of modifications.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Feb 3, 2026

Copy link
Copy Markdown
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin ast

@youknowone youknowone force-pushed the ast branch 6 times, most recently from 395cffb to 99862cd Compare February 4, 2026 14:45
@github-actions

github-actions Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] lib: cpython/Lib/ast.py
[x] lib: cpython/Lib/_ast_unparse.py
[x] test: cpython/Lib/test/test_unparse.py
[x] test: cpython/Lib/test/test_type_comments.py (TODO: 15)

dependencies:

  • ast

dependent tests: (48 tests)

  • ast: test_ast test_builtin test_compile test_dis test_fstring test_future_stmt test_site test_ssl test_type_comments test_ucn test_unparse
    • annotationlib: test_functools test_inspect test_reprlib test_typing
      • inspect: test_abc test_argparse test_asyncgen test_code test_collections test_coroutines test_decimal test_enum test_generators test_grammar test_ntpath test_operator test_patma test_posixpath test_signal test_traceback test_types test_unittest test_yield_from test_zipimport
    • dbm.dumb: test_dbm_dumb
    • inspect:
      • bdb: test_bdb
      • dataclasses: test__colorize test_genericalias test_pprint test_regrtest test_zoneinfo
      • importlib.metadata: test_importlib
      • rlcompleter: test_rlcompleter
      • trace: test_trace
      • xmlrpc.server: test_docxmlrpc test_xmlrpc
    • pyclbr: test_pyclbr

[x] test: cpython/Lib/test/test_builtin.py (TODO: 25)

dependencies:

dependent tests: (no tests depend on builtin)

[ ] test: cpython/Lib/test/test_funcattrs.py (TODO: 5)

dependencies:

dependent tests: (no tests depend on funcattrs)

[ ] lib: cpython/Lib/traceback.py
[ ] test: cpython/Lib/test/test_traceback.py (TODO: 48)

dependencies:

  • traceback (native: collections.abc, itertools, sys)
    • codeop
    • io (native: _io, _thread, errno, sys)
    • tokenize (native: _tokenize, builtins, itertools, sys)
    • _colorize, contextlib, keyword, linecache, textwrap, warnings

dependent tests: (32 tests)

  • traceback: test_asyncio test_builtin test_code_module test_contextlib test_contextlib_async test_coroutines test_importlib test_listcomps test_pyexpat test_setcomps test_socket test_ssl test_subprocess test_threadedtempfile test_threading test_traceback test_unittest test_with test_zipimport
    • code:
      • sqlite3.main: test_sqlite3
    • concurrent.futures.process: test_concurrent_futures
    • logging.config: test_logging
    • py_compile: test_argparse test_cmd_line_script test_compileall test_importlib test_multiprocessing_main_handling test_py_compile test_runpy
    • pydoc: test_enum
      • xmlrpc.server: test_docxmlrpc test_xmlrpc

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone marked this pull request as ready for review February 5, 2026 04:38

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@crates/vm/src/frame.rs`:
- Around line 1209-1234: The LOAD_BUILD_CLASS handler currently assumes
__builtins__ is a mapping and calls get_item, which causes an uncaught TypeError
when __builtins__ is a module; change the else branch of the
Instruction::LoadBuildClass handling (the branch after
self.builtins.downcast_ref::<PyDict>()) to detect module-style builtins
(downcast_ref::<PyModule>() or equivalent) and call get_attr(identifier!(vm,
__build_class__), vm) for modules, translating AttributeError into a NameError
the same way you handle KeyError for mappings; keep the existing get_item path
for dict/mapping objects. Apply the same fix to the slow-path global/builtin
name lookup routine (the name-lookup code referenced in the review) so it uses
get_attr for module builtins and preserves existing error translation logic for
mapping builtins.

In `@crates/vm/src/stdlib/ast/string.rs`:
- Around line 50-134: The merge logic currently keeps the first constant's
TextRange when collapsing adjacent string literals, so location info becomes the
first-span-only; update normalize_joined_str_parts and
normalize_template_str_parts so that when you append a new constant (in the
branches handling JoinedStrPart::Constant and TemplateStrPart::Constant) you
also extend the pending range to cover both spans (e.g., set pending.2 =
TextRange::new(pending.start(), constant.range.end()) or use the project's
helper like TextRange::cover/join to set the pending range to span from the
original pending start to constant.range.end), and ensure
push_joined_str_literal and push_template_str_literal use that extended range
when building Constant::new_str; apply the identical change for both joined and
template paths so the merged literal range extends to the last fragment.

In `@crates/vm/src/stdlib/ast/validate.rs`:
- Around line 349-354: The NamedExpr handler only checks that named.target is an
ast::Expr::Name but doesn't validate it in Store context; after the existing
type check for ast::Expr::Named (and the matches!(&*named.target,
ast::Expr::Name(_)) check), call validate_expr(vm, &named.target,
ast::ExprContext::Store) to validate the target as an assignment target and
propagate any error, then validate the value with validate_expr(vm,
&named.value, ast::ExprContext::Load) as already done; keep the existing type
error message if the target isn't a Name.
🧹 Nitpick comments (3)
crates/vm/src/stdlib/ast.rs (2)

398-470: Consider handling edge cases in func_type parsing.

The parse_func_type function has good structure but a few observations:

  1. Lines 405-406: optimize and target_version are unused (explicitly ignored with let _ =). This is intentional but worth noting.

  2. Line 437: When extracting right, the offset split_at + 2 assumes -> is always 2 bytes, which is correct for ASCII but the comment/documentation should clarify this.

  3. Lines 457-461: The match arm for other expression types (other => vec![other]) is a catch-all that may accept invalid func_type syntax silently.

💡 Optional: Add validation for unexpected expression types
     let argtypes: Vec<ast::Expr> = match arg_expr {
         ast::Expr::Tuple(tup) => tup.elts,
         ast::Expr::Name(_) | ast::Expr::Subscript(_) | ast::Expr::Attribute(_) => vec![arg_expr],
-        other => vec![other],
+        // CPython accepts any expression as a type annotation
+        other => vec![other],
     };

472-503: Minor: redundant assignment on Line 487.

The line let tag = if tag.is_empty() { "" } else { tag }; is a no-op since tag already equals itself in both branches.

♻️ Simplify redundant conditional
         let tag = rest.trim_start();
-        let tag = if tag.is_empty() { "" } else { tag };
         let node = NodeAst
crates/vm/src/stdlib/ast/python.rs (1)

433-513: Redundant setup in module_exec.

The module_exec function (lines 440-477) repeats the same setup that extend_class already performs in NodeAst. This appears to be setting up the AST class that's registered in the module, which may be a different type object than the one defined via #[pyattr].

If this duplication is intentional (to handle both the pyattr-registered type and the module-registered type), consider adding a comment explaining why.

💡 Add clarifying comment
     pub(crate) fn module_exec(
         vm: &VirtualMachine,
         module: &Py<crate::builtins::PyModule>,
     ) -> PyResult<()> {
         __module_exec(vm, module);
         super::super::pyast::extend_module_nodes(vm, module);

+        // The AST class registered in extend_module_nodes needs its attributes
+        // and methods configured separately from the #[pyattr] NodeAst type.
         let ast_type = module
             .get_attr("AST", vm)?

@youknowone youknowone changed the title ast Feb 5, 2026
Hide details View details @youknowone youknowone merged commit 684e880 into RustPython:main Feb 5, 2026
14 checks passed
@youknowone youknowone deleted the ast branch February 5, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant