gh-151497: Avoid huge pre-allocation for oversized tarfile extended headers by iamsharduld · Pull Request #151498 · python/cpython

iamsharduld

…nded headers

tarfile reads a member's extended header (a GNU long name/link or a pax
header) with a single read sized by the header's size field:

    buf = tarfile.fileobj.read(self._block(self.size))

The size is taken from the archive and is not validated, so a ~512-byte
crafted file can claim several gigabytes (or, via base-256 encoding, far
more) and make read() pre-allocate that much memory -- on open/iterate,
before any extraction filter runs.

Read the extended-header data in bounded chunks instead, so an oversized
or truncated header can no longer force a huge allocation. The bytes
returned for valid archives are unchanged.