◐ Shell
clean mode source ↗

gh-151307: Bound zipfile reads for forged compressed sizes by rohitjavvadi · Pull Request #151509 · python/cpython

The forged ZIP from gh-151307 can make ZipExtFile._read2() pass a central-directory-controlled compressed size directly to the underlying file object's read(n). In the local reproducer, a 160-byte archive made the unpatched code call read(2147483647) twice before failing with EOFError.

This keeps the existing overlap warning behavior for duplicate-name entries, but bounds the actual low-level read request:

  • seekable sources are clamped to the bytes actually remaining in the archive
  • unknown-length sources are read in bounded chunks

After the change, the same 160-byte archive still fails as truncated, but the largest underlying read request is 125 bytes and there are no oversized reads.

Fixes gh-151307.

Testing

  • Before/after local reproducer:
    • before: archive size 160, max read size 2147483647, large reads [2147483647, 2147483647]
    • after: archive size 160, max read size 125, large reads []
  • ./python.exe -m test test_zipfile -m test_forged_compress_size_read_is_bounded -v
  • ./python.exe -m test test_zipfile -v
  • git diff --check
  • make patchcheck