◐ Shell
clean mode source ↗

[Security] bpo-30713: Reject newline in urllib.parse by vstinner · Pull Request #2303 · python/cpython

@vstinner

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.

@vstinner vstinner changed the title bpo-30713: Reject newline in urllib.parse [Security] bpo-30713: Reject newline in urllib.parse

Jun 28, 2017

serhiy-storchaka

global _typeprog
if _typeprog is None:
_typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
_typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with '\n' in a type? In any case it is compared with fixed set of supported schemes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with '\n' in a type?

It doesn't make sense to me to have a newline in a type, it really looks like an attempt to compromise a server.

_typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)

match = _typeprog.match(url)
match = _typeprog.fullmatch(url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no difference between match() and fullmatch() here.

global _hostprog
if _hostprog is None:
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
_hostprog = re.compile('//([^/#?\n]*)(.*)')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will caused just returning a tuple (None, url) as in the case when the host is not specified. The second item can contain '\n'. This doesn't mean splithost() rejects the url.

If you want to make splithost() rejecting URLs with newlines, check explicitly '\n' in url and raise an exception. But I think this is not the best place of doing such checks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mean splithost() rejects the url.

Right now, we don't raise an exception if an URL looks "invalid". So I tried to fit into the current behaviour: return (None, invalid_url). Maybe we should raise an exception instead?

global _portprog
if _portprog is None:
_portprog = re.compile('(.*):([0-9]*)$', re.DOTALL)
_portprog = re.compile('(.*):([0-9]*)')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as for splithost(). splithost('example.org\n') will return a tuple ('example.org\n', None).

@vadmium

Sorry @Haypo, I think newlines are a special case in some regular expression implementations, but I don’t remember the details for Python, so it is not clear to me what your code will do. I trust Serhiy has better knowledge with regular expressions :)

@vstinner

@mcepl mcepl mentioned this pull request

Apr 17, 2019