gh-151307: Bound zipfile reads for forged compressed sizes by rohitjavvadi · Pull Request #151509 · python/cpython
The forged ZIP from gh-151307 can make ZipExtFile._read2() pass a central-directory-controlled compressed size directly to the underlying file object's read(n). In the local reproducer, a 160-byte archive made the unpatched code call read(2147483647) twice before failing with EOFError.
This keeps the existing overlap warning behavior for duplicate-name entries, but bounds the actual low-level read request:
- seekable sources are clamped to the bytes actually remaining in the archive
- unknown-length sources are read in bounded chunks
After the change, the same 160-byte archive still fails as truncated, but the largest underlying read request is 125 bytes and there are no oversized reads.
Fixes gh-151307.
Testing
- Before/after local reproducer:
- before: archive size 160, max read size 2147483647, large reads
[2147483647, 2147483647] - after: archive size 160, max read size 125, large reads
[]
- before: archive size 160, max read size 2147483647, large reads
./python.exe -m test test_zipfile -m test_forged_compress_size_read_is_bounded -v./python.exe -m test test_zipfile -vgit diff --checkmake patchcheck