Sum up of my patch:
* it pass all test_thread*.py tests (tested with in pydebug mode)
* it preallocates the thread state in the parent thread to be able to raise an error with PyErr_NoMemory() instead of Py_FatalError()
* PyThreadState_Prealloc() doesn't call _PyGILState_NoteThreadState() because the thread ident is not correct in the parent thread
* Call _PyGILState_NoteThreadState() in the new thread to finish the thread initialization
* Py_InitializeEx() calls _PyGILState_Init() before initsite(), because initsite() may create a thread