[update_lib] todo shows last updated date by youknowone · Pull Request #7053 · RustPython/RustPython
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@scripts/update_lib/deps.py`:
- Around line 1034-1053: The function _count_path_diff currently returns 0 when
path types differ (file vs directory); change it to detect mismatched types and
count the lines on both sides instead of returning 0: if one side is a file and
the other a dir, sum the file's line count (use safe_read_text to read +
splitlines length) and the total lines of all *.py files under the directory
(iterate dir.rglob("*.py") and use safe_read_text for each); keep using
_count_file_diff for the file-file case and the existing directory-directory
aggregation logic in _count_path_diff.
🧹 Nitpick comments (3)
scripts/update_lib/deps.py (2)
1056-1084:subprocess.runforgit loglackscwd— relies on implicit working directory.Both
get_module_last_updatedandget_test_last_updatedrungit logwithout specifyingcwd, while_get_cpython_version(line 186) explicitly usescwd=cpython_prefix. If the script is ever invoked from a directory other than the repo root, the git commands will fail or return wrong results. Consider passingcwdexplicitly for robustness.Suggested fix for `get_module_last_updated`
result = subprocess.run( ["git", "log", "-1", "--format=%cd", "--date=short", "--"] + local_paths, capture_output=True, text=True, timeout=10, + cwd=lib_prefix if not pathlib.Path(local_paths[0]).is_absolute() else None, )
1104-1125: Duplicate pattern withget_module_last_updated— consider extracting a shared helper.The git-log-last-date subprocess call is duplicated between
get_module_last_updated(line 1074) andget_test_last_updated(line 1115). A small shared helper like_git_last_commit_date(paths: list[str]) -> str | Nonewould reduce duplication.scripts/update_lib/cmd_todo.py (1)
578-599: Performance: N subprocess calls forgit log+ N difflib passes per run.Each lib item triggers a
git logsubprocess (viaget_module_last_updated) and potentially a full file diff (viaget_module_diff_stat). With hundreds of modules, this could noticeably slow down the default (non-verbose)todocommand. Consider either:
- Batching git log queries (e.g., one
git logcall with all paths, then parse per-path dates), or- Adding a
--no-metaflag or making metadata opt-in via--verbose(similar to how test metadata is already verbose-only).Not a blocker for a dev tool, but worth considering if users report slowness.
Comment on lines +1034 to +1053
| def _count_path_diff(path_a: pathlib.Path, path_b: pathlib.Path) -> int: | ||
| """Count changed lines between two paths (file or directory, *.py only).""" | ||
| if path_a.is_file() and path_b.is_file(): | ||
| return _count_file_diff(path_a, path_b) | ||
| if path_a.is_dir() and path_b.is_dir(): | ||
| total = 0 | ||
| a_files = {f.relative_to(path_a) for f in path_a.rglob("*.py")} | ||
| b_files = {f.relative_to(path_b) for f in path_b.rglob("*.py")} | ||
| for rel in a_files & b_files: | ||
| total += _count_file_diff(path_a / rel, path_b / rel) | ||
| for rel in a_files - b_files: | ||
| content = safe_read_text(path_a / rel) | ||
| if content: | ||
| total += len(content.splitlines()) | ||
| for rel in b_files - a_files: | ||
| content = safe_read_text(path_b / rel) | ||
| if content: | ||
| total += len(content.splitlines()) | ||
| return total | ||
| return 0 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatched path types (file vs. directory) silently return 0.
If a module changed from a single file to a package (or vice versa) between CPython versions, _count_path_diff returns 0 instead of reflecting the actual diff. This is an edge case but could hide a meaningful change in the todo output.
Consider counting all lines from both sides when types don't match, similar to how you handle files only present on one side in the directory case.
🤖 Prompt for AI Agents
In `@scripts/update_lib/deps.py` around lines 1034 - 1053, The function
_count_path_diff currently returns 0 when path types differ (file vs directory);
change it to detect mismatched types and count the lines on both sides instead
of returning 0: if one side is a file and the other a dir, sum the file's line
count (use safe_read_text to read + splitlines length) and the total lines of
all *.py files under the directory (iterate dir.rglob("*.py") and use
safe_read_text for each); keep using _count_file_diff for the file-file case and
the existing directory-directory aggregation logic in _count_path_diff.