◐ Shell
clean mode source ↗

GH-80789: Bundle ``ensurepip`` wheels at build time by AA-Turner · Pull Request #109130 · python/cpython

Conversation

@AA-Turner

@AA-Turner AA-Turner commented

Sep 8, 2023

edited by github-actions Bot

Loading

Co-authored-by: Sviatoslav Sydorenko <wk@sydorenko.org.ua>
Co-authored-by: Sviatoslav Sydorenko <wk@sydorenko.org.ua>

pfmoore

@AA-Turner

0aaf135 was an attempt to integrate this into the Makefile/PCBuild more 'properly', but I'm not confident in making wider-scale changes here (said commit didn't work, so I've reverted it).

zware

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to need something for buildbots to avoid breaking them all, but I like the direction here.

pradyunsg

@ned-deily

Adding @ambv and @Yhg1s for review of impact on release process and supply chain security.

# Conflicts:
#	.cirrus.yml

pradyunsg

hugovk

pythoninfo_script:
- cd build
- make pythoninfo
bundle_ensurepip_script:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bundle_ensurepip_script:
# Download wheels for the venv step
bundle_ensurepip_script:

Is this the reason for adding it here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything using venv or ensurepip during testing needs this provisioning step. If there's no uses, it'd be unnecessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual reason to include it is that Cirrus CI failed without this and passed with it. I don't know why Free BSD requires the venv step though, as all other tests pass normally -- only documentation and hypothesis (which use venvs) seem to have an explicit venv step.

A

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed-by: Hugo van Kemenade <hugovk@users.noreply.github.com>

@webknjaz

This is going to need something for buildbots to avoid breaking them all, but I like the direction here.

I identified the place needing to be changed in #12791 (comment).

@AA-Turner FYI

@AA-Turner

I identified the place needing to be changed

I tried to explore a different approach of fully integrating this into make all in 0aaf135, which I think may be a better approach as it doesn't require downsteam distributors to change anything in their processes. I defer to those with much more experience with the build systems, though.

A

@webknjaz

I tried to explore a different approach of fully integrating this into make all

I tried doing that in my PR and was getting weird behavior, I wasn't sure why. Maybe I was misusing something in the build machinery or was just confused...

If that works, it's probably fine. OTOH, you may want to take into account that many distributions prefer their build envs to be disconnected from the internet so it might still be wise to provide them with means to pre-provision the wheel separately.

Tagging @encukou @hroncok @mgorny for downstream opinions.

@webknjaz

This PR doesn't attempt to switch to the one-project model, as I found the diff was too large to reasonably review.

Would #109245 help?

@mgorny

If that works, it's probably fine. OTOH, you may want to take into account that many distributions prefer their build envs to be disconnected from the internet so it might still be wise to provide them with means to pre-provision the wheel separately.

In Gentoo we aren't actually using the wheels originally bundled with CPython itself but providing the newest versions of the them separately, so as long as that continues working, I suppose that's fine with us. However, it is is paramount that:

  1. The build process and test suite fully respect --with-wheel-pkg-dir and do not attempt to download anything.
  2. This is entirely restricted to CPython build-time, and in particularly creating venv doesn't attempt any fetching.

However, I can imagine it could be mildly annoying to users who aren't used to the new workflow that having a complete git clone would no longer suffice to actually build CPython offline.

@hroncok

Same for Fedora. With an addition that we do sometimes keep the bundled wheels for too new or too old Pythons, so having a way to pre-populate the wheels instead of downloaidng them would still be needed.

@pradyunsg

@AA-Turner could you resolve the conflicts with this PR?

I think it would be good to have this be in Python 3.13's cycle as early as possible, to allow any potential redistributors who care about the details of ensurepip to have time to adjust things as well as provide us feedback on this as early as feasible.

webknjaz

spec = importlib.util.spec_from_file_location("ensurepip", ENSURE_PIP_INIT)
ensurepip = importlib.util.module_from_spec(spec)
spec.loader.exec_module(ensurepip)
return ensurepip._PROJECTS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AA-Turner I believe this would need to be updated after #109245, and it'll probably let you work with simpler structures.

webknjaz

whl = response.read()
except URLError as exc:
print_error(f"Failed to download {wheel_url!r}: {exc}")
errors = 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this meant to be a counter?

webknjaz

continue
else:
print_error(f"An invalid '{name}' wheel exists.")
os.remove(wheel_path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use pathlib's method? Seeing that it's already used everywhere..

webknjaz

from urllib.error import URLError
from urllib.request import urlopen

HOST = 'https://files.pythonhosted.org'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this something like a PYPI_CDN_URL for maintainability?

webknjaz


def print_notice(message: str) -> None:
if GITHUB_ACTIONS:
print(f"::notice::{message}", end="\n\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this also works if output to stderr FYI.

webknjaz


try:
projects = _get_projects()
except (AttributeError, TypeError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an abstraction leak here: it's hard to guess from looking at the _get_projects() function, what in it might trigger these exceptions. I'd recommend processing them inside that function and making obvious such places where the exceptions may occur. Instead, I'd convert both of the exceptions into something like an ImportError and handle just that. This would contribute to transparency of how that function works.

webknjaz

return 1

errors = 0
for name, version, checksum in projects:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW with #109245, looping will stop making sense here. So maybe you could start working with just projects[0] already?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's keep them separate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kinda assuming that the simplification PR would get merged first, in which case, this one would have to adapt..

webknjaz


* Wheels for :mod:`ensurepip` are no longer bundled in the CPython source
tree. Distributors should bundle these as part of the build process by
running :file:`Tools/build/bundle_ensurepip_wheels.py`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be clearer:

running :file:`Tools/build/bundle_ensurepip_wheels.py`.
running :file:`Tools/build/bundle_ensurepip_wheels.py` with no arguments.

webknjaz

# ensure the pip wheel exists
pip_filename = os.path.join(test.support.STDLIB_DIR, 'ensurepip', '_bundled',
f'pip-{ensurepip._PIP_VERSION}-py3-none-any.whl')
if not os.path.isfile(pip_filename):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if it didn't exist, there should be some code to clean it up after testing so that the dummy file doesn't get forgotten on disk. Tests should avoid side effects...

webknjaz

import bundle_ensurepip_wheels as bew


# Disable fancy GitHub actions output during the tests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test for enabled GHA, just for those helpers, then. It'd be sad not to get coverage there.

webknjaz



def _is_valid_wheel(content: bytes, *, checksum: str) -> bool:
return checksum == sha256(content, usedforsecurity=False).hexdigest()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradyunsg

Closing this per #80789 (comment). Thanks @AA-Turner for filing this, even though we're not going ahead with this (despite me implying that earlier in the discussion).

Reviewers

@webknjaz webknjaz webknjaz left review comments

@hugovk hugovk hugovk left review comments

@pradyunsg pradyunsg pradyunsg approved these changes

@zware zware zware left review comments

@pfmoore pfmoore pfmoore approved these changes

@ezio-melotti ezio-melotti Awaiting requested review from ezio-melotti ezio-melotti is a code owner

Labels