◐ Shell
reader mode source ↗
Skip to content

bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules#9047

Merged
pitrou merged 7 commits into
python:masterfrom
tjb900:pickle_import_race_fix
Feb 18, 2019
Merged

bpo-34572: change _pickle unpickling to use import rather than retrieving from sys.modules#9047
pitrou merged 7 commits into
python:masterfrom
tjb900:pickle_import_race_fix

Conversation

@tjb900

@tjb900 tjb900 commented Sep 3, 2018

Copy link
Copy Markdown
Contributor

import.c seems to contain a lot of protection to ensure that a partially-initialised module is not returned from import when in a multi-threaded environment. PyImport_Import enjoys these protections, but PyImport_GetModule does not.

https://bugs.python.org/issue34572

@the-knights-who-say-ni

Copy link
Copy Markdown

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@tjb900

tjb900 commented Jan 7, 2019

Copy link
Copy Markdown
Contributor Author

Hi!

Just wondering if there's something I can do to move this along - either that or be told that I'm completely wrong. We are using pure-python unpickling as a workaround for now, but I'd love to have some hope that we can switch back at some point in future.

(I notice the same issue appears to be present in @pitrou/pickle5-backport - which I'm very keen to start using!)

Thanks!

@pganssle

pganssle commented Feb 5, 2019

Copy link
Copy Markdown
Member

@tjb900 I realize that this is a thread safety bug which is notoriously hard to test, but I generally find that pull requests that include tests are much easier to review, because it's easier to get a sense of what behavior it fixes and an assurance that there won't be regressions.

@tjb900

tjb900 commented Feb 5, 2019

Copy link
Copy Markdown
Contributor Author

@pganssle Thanks for checking it out. Test case was attached with the bug report, but I didn't include it in the PR - will look at that ASAP.

@pganssle

pganssle commented Feb 5, 2019

Copy link
Copy Markdown
Member

Looks to me like @ericsnowcurrently added this line when adding PyImport_GetModule in PR #3593, with associated issue [bpo-28411](https://bugs.python.org/issue2841https://bugs.python.org/issue28411).

It was added in several places, so I'm wondering if this thread safety bug is a problem for all the uses of PyImport_GetModule, and if so is this the right place for it? Maybe Eric or @ncoghlan has a bit more information about why this code was added and whether falling back to PyImport_Import is the right fix.

@tjb900

tjb900 commented Feb 6, 2019

Copy link
Copy Markdown
Contributor Author

Hmmm. PyImport_GetModule was added in #3593, but it actually looks to me like that logic (i.e. "check in sys.modules, and only import if it isn't there") dates at least back back to

if (module == NULL) {

I'm still trying to identify at what point this became unsafe (have partially initialised modules always been present in sys.modules?). There is talk of changing the way the locking works in #2646, but the locking itself seems much older even than that.

@ncoghlan

ncoghlan commented Feb 6, 2019

Copy link
Copy Markdown
Contributor

It looks to me like the cpickle logic has always been unsafe, but prior to the introduction of per-module import locks for Python 3.3 in https://bugs.python.org/issue9260 it was likely much harder to hit.

@pganssle

pganssle commented Feb 6, 2019

Copy link
Copy Markdown
Member

So if I understand this correctly, the current version of the code likely works the way it does for performance reasons, but unfortunately that comes at the cost of thread safety, in which case I think the current patch is likely the right approach (at least for now - maybe PyImport_GetModule needs to be re-designed to take into account partially initialized modules).

If it's possible to make a reliable but not terribly resource-intensive test for this, I think that would be a big help, but that may be a lot to ask.

@pganssle

pganssle commented Feb 7, 2019

Copy link
Copy Markdown
Member

@tjb900 Thanks for adding that test. I've verified that this test consistently fails when run without the other changes in this PR, and it runs quickly whether or not it succeeds, so it's a +1 from me!

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

@bedevere-bot

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@pablogsal

Copy link
Copy Markdown
Member

One last nitpick, though is that I think it would be better to put it in with the other pickle tests so that people looking to add or change tests will be able to find it more easily (for example there are three or four different test modules where strftime is tested and so I make duplicate tests or miss where things are being tested all the time). That said, if you had a good reason for putting it in a separate file, I'm not doctrinaire about the "one module one test file" thing (though I don't actually know if there's an official policy on this from the CPython core dev team).

I agree with @pganssle in that this test should be placed with the rest of the pickle tests.

@pablogsal pablogsal self-assigned this Feb 7, 2019
@pganssle

pganssle commented Feb 7, 2019

Copy link
Copy Markdown
Member

I haven't tried adapting this to actually use pickle, but I can consistently get a failure due to partially initialized modules on the first try with this configuration. It does use time.sleep, but it only needs to do the import once, which may reduce the side effects.

@tjb900

tjb900 commented Feb 8, 2019

Copy link
Copy Markdown
Contributor Author

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Objects/typeobject.c:

    /* Try to fetch cached copy of copyreg from sys.modules first in an
       attempt to avoid the import overhead. Previously this was implemented
       by storing a reference to the cached module in a static variable, but

18 hidden items Load more…
@tjb900

tjb900 commented Feb 8, 2019

Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again.

Happy to backport to 3.6 and 3.7 also.

@bedevere-bot

Copy link
Copy Markdown

Thanks for making the requested changes!

@pablogsal: please review the changes made to this pull request.

@pitrou

pitrou commented Feb 8, 2019

Copy link
Copy Markdown
Member

This other case of PyImport_GetModule also appears to be making the same, unsafe, optimisation. Should I fix this too, or leave that for a different PR?

Please make that a separate issue on bugs.python.org. We'll have to discuss its performance implications separately.

@pganssle

Copy link
Copy Markdown
Member

@pablogsal Any thoughts on the new tests for this one?

@pitrou pitrou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hide comment

LGTM.

@pitrou

pitrou commented Feb 18, 2019

Copy link
Copy Markdown
Member

@pablogsal, I think your comments have been addressed. Can you confirm?

@pablogsal

Copy link
Copy Markdown
Member

This addresses all my concerns!

I like test_unpickle_module_race a lot :)

@pitrou pitrou merged commit 4371c0a into python:master Feb 18, 2019
@miss-islington

Copy link
Copy Markdown
Contributor

Thanks @tjb900 for the PR, and @pitrou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 18, 2019
…ving from sys.modules (pythonGH-9047)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
(cherry picked from commit 4371c0a)

Co-authored-by: tjb900 <ozburgess@gmail.com>
@bedevere-bot

Copy link
Copy Markdown

GH-11921 is a backport of this pull request to the 3.7 branch.

pitrou pushed a commit that referenced this pull request Feb 18, 2019
…ving from sys.modules (GH-9047) (GH-11921)

Fix C implementation of pickle.loads to use importlib's locking mechanisms, and thereby avoid using partially-loaded modules.
(cherry picked from commit 4371c0a)

Co-authored-by: tjb900 <ozburgess@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants