◐ Shell
reader mode source ↗
Skip to content

bpo-36338: urllib.urlparse rejects invalid IPv6 addresses#16780

Closed
vstinner wants to merge 1 commit into
python:mainfrom
vstinner:urlparse_ipv6
Closed

bpo-36338: urllib.urlparse rejects invalid IPv6 addresses#16780
vstinner wants to merge 1 commit into
python:mainfrom
vstinner:urlparse_ipv6

Conversation

@vstinner

@vstinner vstinner commented Oct 14, 2019

Copy link
Copy Markdown
Member

The urllib.urlparse module now rejects invalid IPv6 addresses and
invalid port numbers when parsing an URL.

https://bugs.python.org/issue36338

@vstinner

Copy link
Copy Markdown
Member Author

@corona10: I wrote the scope test differently for make it more readable.

@corona10 corona10 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

@corona10: I wrote the scope test differently for make it more readable.

@vstinner Awesome!

@vstinner

Copy link
Copy Markdown
Member Author

@vadmium, @zooba, @tirkarthi: Would you mind to review this change? What do you think of my approach using ipaddress and excluding some characters from the IPv6 scope ("zone identifier")?

@vstinner

Copy link
Copy Markdown
Member Author

I modified my PR to also fix https://bugs.python.org/issue33342 : "urllib IPv6 parsing fails with special characters in passwords".

@vstinner

Copy link
Copy Markdown
Member Author

@orsenthil: Would you mind to review this change? Any idea for the allowed characters in an IPv6 scope?

@orsenthil

Copy link
Copy Markdown
Member

@vstinner - Sure, I will review. I will refer to the RFC for the valid characters for the IPv6 scope.

@vstinner

Copy link
Copy Markdown
Member Author

I tried to allow [ and ] in the user:password part, but then the URL parser is confused by the URL:

http://[::1%sc[o]pe]

It reads it as IPv6 ::1%sc[o.

* bpo-36338: The urllib.urlparse module now rejects invalid IPv6
  addresses and invalid port numbers when parsing an URL.
* bpo-33342: Fix urlparse() for IPv6 address with user:password when
  user and/or password contain "[" and/or "]" characters.
@vstinner

Copy link
Copy Markdown
Member Author

I rebased my PR to fix the merge conflict.

@orsenthil: Ping for review.

@vstinner

Copy link
Copy Markdown
Member Author

Firefox doesn't seem to accept % in the IPv6 part of an URL. When I type the following URL, it opens Google with the URL as a search...

http://[::1%1]:8000/

@vstinner

Copy link
Copy Markdown
Member Author

Firefox doesn't seem to accept % in the IPv6 part of an URL. When I type the following URL, it opens Google with the URL as a search...

Same behavior in Chromium.

@orsenthil orsenthil left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

LGTM. Thank you.

@orsenthil

Copy link
Copy Markdown
Member

Ping for review.

Done. Thank you, Victor.

@orsenthil

orsenthil commented Oct 23, 2019

Copy link
Copy Markdown
Member

Firefox doesn't seem to accept % in the IPv6 part of an URL.

Same behavior in Chromium.

I expected these browsers to have percent-encode these and work.

https://bugzilla.mozilla.org/show_bug.cgi?id=700999

Also, "Microsoft Edge (as well as Microsoft Explorer) works well with link local IPV6 addresses."

This is the most relevant information I found

https://en.wikipedia.org/wiki/IPv6_address#Use_of_zone_indices_in_URIs

When used in uniform resource identifiers (URI), the use of the percent sign causes a syntax conflict, therefore it must be escaped via percent-encoding,[11] e.g.:

http://[fe80::1ff:fe23:4567:890a%25eth0]/

@vstinner

Copy link
Copy Markdown
Member Author

The living URL Standard doesn't implement IPv6 scope on purpose:

Support for <zone_id> is intentionally omitted.

This comment points to https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2 which is a comment written by Ryan Sleevi at 2015-08-14:

Yes, we're especially not keen to support these in Chrome and have repeatedly decided not to. The platform-specific nature of <zone_id> makes it difficult to impossible to validate the well-formedness of the URL (see https://tools.ietf.org/html/rfc4007#section-11.2 , as referenced in 6874, to fully appreciate this special hell). Even if we could reliably parse these (from a URL spec standpoint), it then has to be handed 'somewhere', and that opens a new can of worms.

Even 6874 notes how unlikely it is to encounter these in practice

   Thus, URIs including a
   ZoneID are unlikely to be encountered in HTML documents.  However, if
   they do (for example, in a diagnostic script coded in HTML), it would
   be appropriate to treat them exactly as above.

Note that a 'dumb' parser may not be sufficient, as the Security Considerations of 6874 note:

   To limit this risk, implementations MUST NOT allow use of this format
   except for well-defined usages, such as sending to link-local
   addresses under prefix fe80::/10.  At the time of writing, this is
   the only well-defined usage known.

And also

   An HTTP client, proxy, or other intermediary MUST remove any ZoneID
   attached to an outgoing URI, as it has only local significance at the
   sending host.

This requires a transformative rewrite of any URLs going out the wire. That's pretty substantial. Anne, do you recall the bug talking about IP canonicalization (e.g. http://127.0.0.1 vs http://[::127.0.0.1] vs http://012345 and friends?) This is conceptually a similar issue - except it's explicitly required in the context of <zone_id> that the <zone_id> not be emitted.

There's also the issue that zone_id precludes/requires the use of APIs that user agents would otherwise prefer to avoid, in order to 'properly' handle the zone_id interpretation. For example, Chromium on some platforms uses a built in DNS resolver, and so our address lookup functions would need to define and support <zone_id>'s and map them to system concepts. In doing so, you could end up with weird situations where a URL works in Firefox but not Chrome, even though both 'hypothetically' supported <zone_id>'s, because FF may use an OS routine and Chrome may use a built-in routine and they diverge.

Overall, our internal consensus is that <zone_id>'s are bonkers on many grounds - the technical ambiguity (and RFC 6874 doesn't really resolve the ambiguity as much as it fully owns it and just says #YOLOSWAG) - and supporting them would add a lot of complexity for what is explicitly and admittedly a limited value use case.

Firefox feature request https://bugzilla.mozilla.org/show_bug.cgi?id=700999 has been rejected using this comment as well at 2015-08-14.


Currently, only Microsoft Edge supports IPv6 scope: Firefox and Chromium don't.

I suggest to follow Firefox, Chromium and living URL Standard example: don't support IPv6 scope.

My current implementation doesn't implement the RFC 6874 which suggests to use %25 between the IPv6 and the scope. For example address ::1 with scope eth0 should be written ::1%25eth0. This syntax is hard to read if you use numeric scopes which are common: ::1 with scope 2 should be written ::1%252 :-(

@vstinner

Copy link
Copy Markdown
Member Author

It is supposed to get the unquoted URL. I relied on tests of urlparse to state this.

This makes me even more uncomfortable to support IPv6 scope: it is not well defined if urlsplit() is expected to be used on a quote or unquoted URL. This is a major difference for RFC 6874 which is tied to quoted characters. Not well defined means: we should not have to dig into tests to reverse engineer the "expected" function behavior. It should be well documented and well tested.

I mean that if someone wants to support IPv6 scope in URL, I suggest to first clarify what urlsplit() expects. IMHO fixing this is out of the scope of fixing https://bugs.python.org/issue36338 security vulnerability.

@vstinner

Copy link
Copy Markdown
Member Author

I failed finding time to finish the PR. I prefer to abandon it.

@vstinner vstinner closed this Sep 21, 2021
@vstinner vstinner deleted the urlparse_ipv6 branch September 21, 2021 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants