◐ Shell
reader mode source ↗
Skip to content

bpo-46337: Urllib.parse scheme-specific behavior without reliance on URL scheme#30520

Closed
oldaccountdeadname wants to merge 23 commits into
python:mainfrom
oldaccountdeadname:urllib-custom-schemes
Closed

bpo-46337: Urllib.parse scheme-specific behavior without reliance on URL scheme#30520
oldaccountdeadname wants to merge 23 commits into
python:mainfrom
oldaccountdeadname:urllib-custom-schemes

Conversation

@oldaccountdeadname

@oldaccountdeadname oldaccountdeadname commented Jan 10, 2022

Copy link
Copy Markdown

See bpo-46337. Basically, this allows code like this:

>>> urljoin("custom-scheme://a.host/loc", "..", "custom-scheme://a.host/", classes=[urllib.parse.SchemeClass.RELATIVE, urllib.parse.SchemeClass.NETLOC])
'custom-scheme://a.host/'

https://bugs.python.org/issue46337

Some features in urllib are dependent on schemes, (i.e., preserving the
netloc in url joining). Prior to this patch, this was governed by the
uses_* lists (uses_relative, uses_netloc, uses_params) which hard code
these attributes for certain schemes. Providing an enum interface and a
'constructor' that allows overrides makes this mechanism a bit more
flexible for future modifications.
This allows the callers of urljoin and urlparse to add guaranteed scheme
classes to the url regardless of the actual scheme, which may not be in
the default uses_* lists of schemes. This call-time behavior is done
through an optional parameter that preserves backwards compatibility.

A test case is added for this, and requires the change present in
test_urlparse.checkJoin.
@github-actions

Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Feb 10, 2022
@orsenthil orsenthil self-requested a review February 11, 2022 05:57

@MaxwellDupre MaxwellDupre left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

Could you add info to docs Please?
Otherwise how will users know how to use or that it exists?

@oldaccountdeadname

oldaccountdeadname commented Feb 26, 2022 via email

Copy link
Copy Markdown
Author

This functionality was exposed in 53c6ccc.
@oldaccountdeadname

Copy link
Copy Markdown
Author

Ah, CI was failing due to my editor inserting tabs - should be fixed. The docs commit is now f9b59dd.

It looks like CI expects this when building documentation.
@oldaccountdeadname

Copy link
Copy Markdown
Author

@orsenthil - sorry for the extra email/ping, but would you be able to give this a review when you've got some spare time? It's been sitting stale for roughly a month now. thanks!

@orsenthil

Copy link
Copy Markdown
Member

@lincolnauster - sure, I will review.

@orsenthil orsenthil self-assigned this Mar 12, 2022
urljoin will not treat `..` as moving up one directory rather than
moving up one file, thus causing the doctests to fail due to a missing
trailing slash. Both changes are of the form:

http://example.org/post/x -> http://example.org/post/x/

Additionally, the my-protocol example's expected output had the wrong
scheme.
@oldaccountdeadname

Copy link
Copy Markdown
Author

sigh - code was right, docs were wrong.

Stupid question, but I couldn't find this in the devguide. How do I actually run the doctests on my local checkout? I was performing them manually in a shell but evidently I accidentally corrected them while typing them in.

I'm running this in Doc/:

gmake doctest PYTHON=../python SPHINXOPTS="-j24 --keep-going" 

...and I'm getting a ton of errors from code I haven't touched. (A bunch of stuff in the ast module, _tkinter wasn't found, etc.) Is this a misconfiguration on my part? How can I actually run doctests?

Also, I pushed a fix for the current cause of failure. Would it be helpful to squash/rebase it into the original doc commit (f9b59dd) and force-push? Thanks so much!

oldaccountdeadname and others added 2 commits March 13, 2022 01:09
Minor style things:
+ _scheme_classes' docstring's summary was made explicit.
+ _scheme_classes was prepended with and followed by two newlines.

@ethanfurman ethanfurman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

I think SchemeFlag works better than SchemeClass. Either way, use an enum.Flag for it, and consider using __repr__ similar to the one in re.RegexFlag.

@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

60 hidden items Load more…
@oldaccountdeadname

Copy link
Copy Markdown
Author

I have made the requested changes; please review again.

Again, thanks so much for the thorough review, and please tell me if there's anything missing!

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

@orsenthil

Copy link
Copy Markdown
Member

(copying from #90495) - we can continue discussion here as lot of context is preserved here.


Hi @lincolnauster , I was -1 and was thinking much on introducing a flag with the enum in the parse module.

urlparse(url, scheme='', allow_fragments=True, flags=SchemeFlag(0)):

This API signature is going to confuse people and will be huge blocker for further adoption and change (even if the default arguments are specified). I was thinking how best to mitigate that.

  1. Did we ever consider not going to flag as as parameter?
  2. Any other default for the flag rather than an enum?

If design design required, we can bring it to a wider forum too.

@orsenthil orsenthil requested a review from gpshead April 14, 2022 22:43
@oldaccountdeadname

oldaccountdeadname commented Apr 14, 2022 via email

Copy link
Copy Markdown
Author

@gpshead gpshead left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

I agree with Senthil about the API growing something unobvious. If we add a parameter, it should be keyword only and well named to indicate that it is unusual to provide it. I'd also set its default value to None rather than introducing the casual user to a new concept (SchemeFlag).

The name flags= is probably not sufficient. Something more wordy like behavior_overrides= would communicate better that these are not necessary in most cases as reasonable default behaviors exist. Similarly the name SchemeFlags doesn't necessarily communicate the right thing when seen in code as it technically has nothing to do with a scheme itself. Perhaps something like ParseBehaviors?

@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@oldaccountdeadname

Copy link
Copy Markdown
Author

Hi, sorry for getting back to this late, and thanks for all the feedback.

I agree with Senthil about the API growing something unobvious. If we add a parameter, it should be keyword only and well named to indicate that it is unusual to provide it. I'd also set its default value to None rather than introducing the casual user to a new concept (SchemeFlag).

The name flags= is probably not sufficient. Something more wordy like behavior_overrides= would communicate better that these are not necessary in most cases as reasonable default behaviors exist. Similarly the name SchemeFlags doesn't necessarily communicate the right thing when seen in code as it technically has nothing to do with a scheme itself. Perhaps something like ParseBehaviors?

Agreed, I'll push these API clarifications shortly!

@ethanfurman

Copy link
Copy Markdown
Member

Okay, I finally had some time to do a thorough review of the module and, as much as I love enums, I don't think this is the right solution to this problem:

  • allow_fragments should be part of the enum, but that would just be confusing at this point
  • specifying NETLOC or RELATIVE to urlparse() does nothing

I like the attempt to unify the three class lists, but I don't think this approach is the best choice for the urlparse module as it exists.

Given that we already have an allow_fragments flag, I think the best path forward is to add an allow_params flag, with a default of False -- or add a separate universal parsing function, as has been suggested earlier.

@lincolnauster Are you interested in converting this PR to one of the two above choices?

@ethanfurman ethanfurman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

See previous comment.

@oldaccountdeadname

oldaccountdeadname commented Jun 12, 2022 via email

Copy link
Copy Markdown
Author

@oldaccountdeadname

oldaccountdeadname commented Oct 11, 2022 via email

Copy link
Copy Markdown
Author

@oldaccountdeadname

oldaccountdeadname commented Oct 11, 2022 via email

Copy link
Copy Markdown
Author

@orsenthil

Copy link
Copy Markdown
Member

I think this change is stale now. We really do not want to introduce additional complexity to the parsing here, at least many changes in one go. I am closing this request, and we can reopen it to add individual changes (minor refactors in a planned manner) if we want a generic solution. Current diff can introduce complexity that many reviewers have observed above.

Thank you for your contribution.

@orsenthil orsenthil closed this Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting changes stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants