bpo-39039: tarfile raises descriptive exception from zlib.error#27766
Conversation
* during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file
ambv
left a comment
There was a problem hiding this comment.
Thanks for working on this. This will need some changes and a test.
Sorry, something went wrong.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Sorry, something went wrong.
|
@ambv I am having a hard time figuring out how to reproduce this other than just using the file originally attached on the bpo. See my comment on the bpo, and I appreciate if you can offer some help! https://bugs.python.org/issue39039 |
Sorry, something went wrong.
|
@ambv ok, I implemented all the suggested changes and also added a test. However, the test is obviously not good because I want the test to genuinely reproduce the bug condition by passing in a file which actually causes the zlib error to be raised. However, I can't do that because I don't really know how the file uploaded to the bpo can be reproduced. I also haven't gone through with investigating lzma or bz2 files. Partly because I can't figure out how to corrupt files in a way to elicit the problem response. I haven't been able to make lzma or bz2 archives that fit the edge case because I don't know how. I'm going to keep experimenting with this and try to learn more. If you or anyone else has any insight into how the corrupt bpo-uploaded file was created, I would appreciate any advice because I have had no luck. Also, note that the history of what I have tried and what I (don't) understand is written out here in a bit more detail: https://gist.github.com/jdevries3133/acbb5ba2a19093d3bcc214733ef85e5a |
Sorry, something went wrong.
|
Since Ethan is interested in this, I won't be approving or further requesting changes.
In this case leave this for a future improvement. Thanks for trying it out, Jack! |
Sorry, something went wrong.
|
@ambv ok, I all the feedback has been addressed. Łukasz, if you do feel that there are more potential edge cases that I haven't been able to address, maybe you could summarize them on the bpo? The original poster of the bpo did add some additional context for how he made the file, so maybe in combination with that, others can look into further edge cases. For the bot: I have made the requested changes; please review again |
Sorry, something went wrong.
|
Thanks for making the requested changes! @ambv: please review the changes made to this pull request. |
Sorry, something went wrong.
|
This PR is stale because it has been open for 30 days with no activity. |
Sorry, something went wrong.
|
Thanks @jdevries3133 for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9. |
Sorry, something went wrong.
|
Thanks @jdevries3133 for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
Sorry, something went wrong.
|
Sorry, @jdevries3133 and @ambv, I could not cleanly backport this to |
Sorry, something went wrong.
|
Sorry @jdevries3133 and @ambv, I had trouble checking out the |
Sorry, something went wrong.
pythonGH-27766) * during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file (cherry picked from commit b6fe857) Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
…pythonGH-27766) * during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file. (cherry picked from commit b6fe857) Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
GH-27766) (GH-28613) * during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file (cherry picked from commit b6fe857) Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
…GH-27766) (GH-28614) * during tarfile parsing, a zlib error indicates invalid data * tarfile.open now raises a descriptive exception from the zlib error * this makes it clear to the user that they may be trying to open a corrupted tar file. (cherry picked from commit b6fe857) Co-authored-by: Jack DeVries <58614260+jdevries3133@users.noreply.github.com>
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot PPC64LE Fedora Stable 3.9 has failed when building commit 7bff4d3. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/all/#builders/397/builds/207 Failed tests:
Failed subtests:
Summary of the results of the build (if available): == Tests result: FAILURE then FAILURE == 412 tests OK. 10 slowest tests:
1 test failed: 12 tests skipped: 1 re-run test: Total duration: 13 min 45 sec Click to see traceback logsTraceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt
|
Sorry, something went wrong.
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot PPC64LE RHEL7 3.9 has failed when building commit 7bff4d3. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/all/#builders/219/builds/233 Failed tests:
Failed subtests:
Summary of the results of the build (if available): == Tests result: FAILURE then FAILURE == 411 tests OK. 10 slowest tests:
1 test failed: 12 tests skipped: 2 re-run tests: 1 test run no tests: Total duration: 19 min 6 sec Click to see traceback logsTraceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out
|
Sorry, something went wrong.
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot PPC64LE RHEL7 LTO + PGO 3.9 has failed when building commit 7bff4d3. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/all/#builders/532/builds/232 Failed tests:
Failed subtests:
Summary of the results of the build (if available): == Tests result: FAILURE then FAILURE == 411 tests OK. 10 slowest tests:
1 test failed: 13 tests skipped: 1 re-run test: Total duration: 17 min 26 sec Click to see traceback logsTraceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out
|
Sorry, something went wrong.
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot PPC64LE RHEL7 LTO 3.9 has failed when building commit 7bff4d3. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/all/#builders/322/builds/233 Failed tests:
Failed subtests:
Summary of the results of the build (if available): == Tests result: FAILURE then FAILURE == 412 tests OK. 10 slowest tests:
1 test failed: 12 tests skipped: 1 re-run test: Total duration: 12 min 13 sec Click to see traceback logsTraceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/http/client.py", line 596, in _readinto_chunked
n = self._safe_readinto(mvb)
http.client.IncompleteRead: IncompleteRead(51 bytes read, 1893 more expected)
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
http.client.IncompleteRead: IncompleteRead(0 bytes read)
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-RHEL7-ppc64le.lto/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt
|
Sorry, something went wrong.
⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️Hi! The buildbot PPC64LE Fedora Stable LTO + PGO 3.9 has failed when building commit 7bff4d3. What do you need to do:
You can take a look at the buildbot page here: https://buildbot.python.org/all/#builders/46/builds/210 Failed tests:
Failed subtests:
Summary of the results of the build (if available): == Tests result: FAILURE then FAILURE == 411 tests OK. 10 slowest tests:
1 test failed: 13 tests skipped: 1 re-run test: Total duration: 9 min 43 sec Click to see traceback logsTraceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/multiprocessing/resource_tracker.py", line 201, in main
cache[rtype].remove(name)
KeyError: '/psm_812a9b04'
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
http.client.IncompleteRead: IncompleteRead(0 bytes read)
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/multiprocessing/resource_tracker.py", line 201, in main
cache[rtype].remove(name)
KeyError: '/psm_2b77cafb'
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 336, in test_normalization
self.fail(f"Could not retrieve {TESTDATAURL}")
AssertionError: Could not retrieve http://www.pythontest.net/unicode/13.0.0/NormalizationTest.txt
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/test/test_unicodedata.py", line 330, in test_normalization
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
socket.timeout: timed out
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/multiprocessing/resource_tracker.py", line 201, in main
cache[rtype].remove(name)
KeyError: '/psm_11f35282'
Traceback (most recent call last):
File "/home/buildbot/buildarea/3.9.cstratak-fedora-stable-ppc64le.lto-pgo/build/Lib/http/client.py", line 596, in _readinto_chunked
n = self._safe_readinto(mvb)
http.client.IncompleteRead: IncompleteRead(873 bytes read, 1780 more expected)
|
Sorry, something went wrong.
corrupted tar file
I think that this small changes achieves a fix for the bpo. Tests still pass,
so that's a good sign that maybe this change does not introduce side-effects,
but I am hoping that someone with knowledge of this library help me think
of some edge cases to try out.
Please note that I am a relative novice developer, and while this patch is
good to the best of my understanding, my understanding is not the best!
https://bugs.python.org/issue39039