◐ Shell
reader mode source ↗
Skip to content

Add soft shutdown#958

Merged
lostmsu merged 163 commits into
pythonnet:masterfrom
amos402:soft-shutdown
Oct 8, 2020
Merged

Add soft shutdown#958
lostmsu merged 163 commits into
pythonnet:masterfrom
amos402:soft-shutdown

Conversation

@amos402

@amos402 amos402 commented Sep 17, 2019

Copy link
Copy Markdown
Member

What does this implement/fix? Explain your changes.

An upgraded version for #754
An implementation for #957
Also included some minor fixes.

Still simple and roughly, reset all GC objects by a save state, the old domain objects will be leaking and unaccessible.
Here's my next plan, I think I can replace all the clr objects slots to solid callbacks, that may help the memory leaking.

How to use: use PythonEngine.Initialize(softShutdown: true) for startup.

Updated Details:

  • Changes from above: objects in GC chains will not remove by default, it can be setting by RuntimeState.ShouldRestoreObjects, and modules which not in sys.modules after just call initial CPython will be remove from sys.modules on soft shutdown mode.
  • Multiple fixes about memory leaks and reference count errors.
  • The heap types which represent managed types will degenerate to normal type objects by set their slot callbacks to some function that builtin CPython regardless of normal mode or soft shutdown mode. Before their degenerations, through using tp_travese and tp_clear to release their sub objects, then use SlotsHolder.ReleaseTypeSlots reset their slots.
    Points can be optimize for the above mechanism:
    • tp_travese and tp_clear are not necessary for all time, they will become overhead for CPython's GC, maybe we can just use C# methods instead.
    • Maybe name of SlotsHolder is not suitable because it also handle the memory release.
  • tp_is_gc and some other slots are no need to be override at start, they can inherit from PyType_Type by default.
  • Interop do not holding the delegate objects by calling GetThunk anymore, the SlotsHolder will take care of them.
  • __import__ will restore at shutdown.

Does this close any currently open issues?

#957

Any other comments?

Checklist

Check all those that are applicable and complete.

  • Make sure to include one or more tests for your change
  • If an enhancement PR, please create docs and at best an example
  • Add yourself to AUTHORS
  • Updated the CHANGELOG

@lostmsu

lostmsu commented Sep 17, 2019

Copy link
Copy Markdown
Member

Can you explain how does soft shutdown work in more details?

@amos402

amos402 commented Sep 17, 2019

Copy link
Copy Markdown
Member Author

The base concept is, after Py_Initialize, save all GC objects and modules, and reset them when you shutting down.
Since the domain will be unload, the saved state cannot use any C# data types, also we need a dummy GC for storing the old objects for make sure some dealloc operations can call _PyObject_GC_UnTrack macro without a nullptr error. You can see the details in runtime_state.cs.
After discuss with @mjmvisser, he points some problems that relate restart Python will crash the PyQt. So I try to make a solution for that. I was intend to create two modes, one for reset all GC objects and modules(for now) and one for reset all slots of CLR types after previous mode, but I just saw this issue(#957), so I push it in advance.
Well, although my computer pass all tests, but the CI still not, I still need some fixes for the CI.

@benoithudson

Copy link
Copy Markdown
Contributor

I'm unsure how this relates to #957. Seems to be more about improving the GC handling?

The prototype that @BadSingleton did was, in implementation, fairly simple (took us a while to figure it out of course):

  1. Don't Py_Finalize in Shutdown
  2. On Shutdown, go through the TypeManager and set all the slots to zero (except the gc ones).

@amos402

amos402 commented Sep 18, 2019

Copy link
Copy Markdown
Member Author

Well, Python crash is connect with GC handling, after domain unload, when the GC collect occurred, it will traverse all GC objects, but the memory from old domain has been invalid, any access will result to crash.
If you just take them from GC chains, you can avoid access those invalid memory access to prevent crash. It maybe simple, but lots of memory will leak. So I need to do a lot of work to reduce the leaking, but it still cannot avoid them at all.

13 hidden conversations Load more…
@benoithudson

benoithudson commented Nov 7, 2019

Copy link
Copy Markdown
Contributor

I've summarized my understanding in this document: https://docs.google.com/document/d/1a9OLsdKHXJ6MxHjo0WlVRcfjAiaP5hONNKUKKH6Tr-o

I see two PRs that would be easy to pull out of amos' code and check in right now:

I don't yet understand what the gc chains thing is trying to address. I also don't understand how amos handles the modules. I suspect they're related.

It seems like amos is turning off python gc for the proxy objects. If that works, we should just turn it all off and then we can eliminate the native code page. But that's yet another PR.

There seems to be some refcount fixes as well. If we can get those fixes in separate PRs that makes it easier to review.

* Drop C module dependency when getting _PyObject_NextNotImplemented
* Exception details for SetNoSiteFlag
filmor pushed a commit that referenced this pull request Dec 2, 2019
* Add exception helper
* Fixed refcnt error in ExtensionType.FinalizeObject
* Fixed typename leaking
* Fix refcnt error by using `using`
279 hidden items Load more…
BadSingleton and others added 10 commits September 15, 2020 14:31
Even if netstandard lacks (for now) the necessary APIs for domain
(un)loading, it is still possible to (un)load domains via the Mono C
runtime.
…e pointer"

We Don't support python 2 anymore, but the CI machines may still be
using it to build.
Also adds implicit IntPtr conversion operators to simplify their use.
…mments-3

* Remove compile-time check on NETSTANDARD
* Extract method `ClearCLRData`
@lostmsu

lostmsu commented Sep 29, 2020

Copy link
Copy Markdown
Member

@BadSingleton / @amos402 can you please merge from master?

@codecov-io

codecov-io commented Oct 8, 2020

Copy link
Copy Markdown

Codecov Report

Merging #958 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #958   +/-   ##
=======================================
  Coverage   86.25%   86.25%           
=======================================
  Files           1        1           
  Lines         291      291           
=======================================
  Hits          251      251           
  Misses         40       40           
Flag Coverage Δ
#setup_linux 64.94% <ø> (ø)
#setup_windows 72.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c79be84...8d00e4c. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants