◐ Shell
clean mode source ↗

Issue 24332: urllib.parse should not discard delimiters when associated component is empty

Issue24332

Created on 2015-05-30 18:05 by gdata gmail, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg244477 - (view) Author: gdata gmail (gdata gmail) Date: 2015-05-30 18:05
The documenatation for urllib.parse (https://docs.python.org/3.0/library/urllib.parse.html) states several times:

"This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent)."

This is false -- RFC 3986 explicitly states that ? with an empty query is _not_ equivalent to a URL without it.  For example, the following two URL's should be considered different:

http://example.com/?
http://example.com/

https://tools.ietf.org/html/rfc3986#section-6.2.3
msg244515 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-31 04:34
This is essentially the same as Issue 22852. The title just refers to stripping an empty #fragment, but the netloc and query components are also affected. I have a patch there which needs reviewing, if you are interested. Or if you have any alternative ideas on how to solve this they would be welcome too.
History
Date User Action Args
2022-04-11 14:58:17adminsetgithub: 68520
2015-05-31 04:34:30martin.pantersetstatus: open -> closed

superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc

nosy: + martin.panter
messages: + msg244515
resolution: duplicate
stage: resolved

2015-05-30 23:03:38ned.deilysetnosy: + orsenthil
2015-05-30 18:05:17gdata gmailcreate