Issue 5843: Normalization error in urlunparse
Created on 2009-04-25 19:12 by eric.araujo, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (6)
msg86538 - (view)
Author: Éric Araujo (eric.araujo) *
Date: 2009-04-25 19:12
Date: 2009-04-25 19:45
Date: 2010-07-14 19:09
Date: 2015-02-09 00:17
Date: 2009-04-25 19:12
Docstring for urlunparse says:
"""Put a parsed URI back together again. This may result in a
slightly different, but equivalent URI, if the URI that was parsed
originally had redundant delimiters, e.g. a ? with an empty query
(the draft states that these are equivalent)."""
“Draft” here refers to RFC 1808, superseded by 3986. However, RFC 3986
(section 6.2.3) states:
“Normalization should not remove delimiters when their associated
component is empty unless licensed to do so by the scheme
specification. For example, the URI "http://example.com/?" cannot be
assumed to be equivalent to any of the examples above. Likewise, the
presence or absence of delimiters within a userinfo subcomponent is
usually significant to its interpretation. The fragment component is
not subject to any scheme-based normalization; thus, two URIs that
differ only by the suffix "#" are considered different regardless of
the scheme.”
I guess we need some tests here to check compliance.
msg86541 - (view)
Author: Éric Araujo (eric.araujo) *
Date: 2009-04-25 19:45
This is indeed a bug. urlunparse should special-case "#" so as not to discard it.msg110314 - (view) Author: Senthil Kumaran (orsenthil) *
Date: 2010-07-14 19:09
Currently this claim will fail:
>>> obj = urlparse.urlparse('http://a/b/c?')
>>> urlparse.urlunparse(obj)
'http://a/b/c'
>>> obj = urlparse.urlparse('http://a/b/c#')
>>> urlparse.urlunparse(obj)
'http://a/b/c'
If we move away from the current behavior, there will surely be some test failures that can be observed for urljoins. We will have to consider those cases too while fixing this.
msg228009 - (view)
Author: Mark Lawrence (BreamoreBoy) *
Date: 2014-09-30 21:45
Slipped under the radar guys?msg228853 - (view) Author: Aaron Hill (Aaron1011) * Date: 2014-10-09 10:21
In order to fix this, I think ParseResult needs to have two additional fields, indicating with an empty prefix or query string are used. Both ParseResult.fragment and ParseResult.query omit the leading '#' or '?' from their value. This makes it impossible to determine if the fragment/query string is entirely absent, or has no value.msg235579 - (view) Author: Martin Panter (martin.panter) *
Date: 2015-02-09 00:17
Looks like this duplicates Issue 22852, which has a patch, although its author had second thoughts on the implementation
History
Date
User
Action
Args
2022-04-11 14:56:48adminsetgithub: 50093
2015-05-31 04:25:46martin.pantersetstatus: open -> closed
superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
resolution: duplicate
stage: resolved 2015-02-09 00:17:34martin.pantersetnosy: + martin.panter
messages: + msg235579
2014-10-09 10:21:51Aaron1011setnosy: + Aaron1011
messages: + msg228853
2014-09-30 21:45:28BreamoreBoysetnosy: + BreamoreBoy
title: Possible normalization error in urlparse.urlunparse -> Normalization error in urlunparse
components: + Library (Lib)
versions: + Python 3.1, Python 2.7, Python 3.2 2010-08-18 00:15:17dstaneksetnosy: + dstanek
2010-07-14 19:09:30orsenthilsetmessages: + msg110314 2010-07-11 14:28:57eric.araujosetassignee: orsenthil
superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
resolution: duplicate
stage: resolved 2015-02-09 00:17:34martin.pantersetnosy: + martin.panter
messages: + msg235579
2014-10-09 10:21:51Aaron1011setnosy: + Aaron1011
messages: + msg228853
2014-09-30 21:45:28BreamoreBoysetnosy: + BreamoreBoy
messages:
+ msg228009
versions:
+ Python 3.4, Python 3.5, - Python 3.1, Python 3.2
title: Possible normalization error in urlparse.urlunparse -> Normalization error in urlunparse
components: + Library (Lib)
versions: + Python 3.1, Python 2.7, Python 3.2 2010-08-18 00:15:17dstaneksetnosy: + dstanek
2010-07-14 19:09:30orsenthilsetmessages: + msg110314 2010-07-11 14:28:57eric.araujosetassignee: orsenthil
type: behavior
nosy:
+ orsenthil