Just to summarise, I'm fairly sure this is exactly what Victor saw: a daemon thread attempts to reacquire the GIL via Py_END_ALLOW_THREADS after interpreter finalisation. Obviously the threadstate pointer held by the thread is then invalid...so we crash.
So I see basically two options:
1. Don't (always) free threadstate structures in Py_Finalize, and figure out a way to avoid leaking them (if Python is re-initialized in the same process).
2. Ban this behaviour entirely, e.g. have Py_Finalize fail if there are live threads with threadstate objects.
The discussion so far assumes that we should support this, i.e. #1. Any thoughts on that? (I'll have a think about whether this is actually doable!)