◐ Shell clean mode source ↗

[Security] bpo-30713: Reject newline in urllib.parse by vstinner · Pull Request #2303 · python/cpython

the-knights-who-say-ni

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.

The splittype(), splitport() and splithost() functions of the
urllib.parse module now reject URLs which contain a newline
character.

vstinner changed the title ~~bpo-30713: Reject newline in urllib.parse~~ [Security] bpo-30713: Reject newline in urllib.parse

		global _typeprog
		if _typeprog is None:
		_typeprog = re.compile('([^/:]+):(.*)', re.DOTALL)
		_typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with '\n' in a type? In any case it is compared with fixed set of supported schemes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with '\n' in a type?

It doesn't make sense to me to have a newline in a type, it really looks like an attempt to compromise a server.

		_typeprog = re.compile('([^/:\n]+):(.*)', re.DOTALL)

		match = _typeprog.match(url)
		match = _typeprog.fullmatch(url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no difference between match() and fullmatch() here.

		global _hostprog
		if _hostprog is None:
		_hostprog = re.compile('//([^/#?])(.)', re.DOTALL)
		_hostprog = re.compile('//([^/#?\n])(.)')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will caused just returning a tuple (None, url) as in the case when the host is not specified. The second item can contain '\n'. This doesn't mean splithost() rejects the url.

If you want to make splithost() rejecting URLs with newlines, check explicitly '\n' in url and raise an exception. But I think this is not the best place of doing such checks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mean splithost() rejects the url.

Right now, we don't raise an exception if an URL looks "invalid". So I tried to fit into the current behaviour: return (None, invalid_url). Maybe we should raise an exception instead?

		global _portprog
		if _portprog is None:
		_portprog = re.compile('(.):([0-9])$', re.DOTALL)
		_portprog = re.compile('(.):([0-9])')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as for splithost(). splithost('example.org\n') will return a tuple ('example.org\n', None).

Sorry @Haypo, I think newlines are a special case in some regular expression implementations, but I don’t remember the details for Python, so it is not clear to me what your code will do. I trust Serhiy has better knowledge with regular expressions :)

mcepl mentioned this pull request