◐ Shell
clean mode source ↗

Issue 25631: Segmentation fault with invalid Unicode command-line arguments in embedded Python

The following embedded application, which calls Py_Main with a "-W X" argument where X is not a valid Unicode string, returns a segmentation fault:

```
#include "Python.h"

main() {
    wchar_t *invalid_str;
    invalid_str = malloc(2*sizeof(wchar_t));
    invalid_str[0] = 0x110000;
    invalid_str[1] = 0;
    wchar_t *argv[4] = {L"embedded-python", L"-W", invalid_str, NULL};

    Py_Main(3, argv);
}
```

This segmentation fault is present in Python 3.4, 3.5, and the latest development branch I downloaded, but is not present in Python 3.2. This program is obviously invalid and it may be reasonable to emit a fatal error in this situation, but it should not give a segmentation fault.

I believe the issue is that this codes leads to exception being thrown before exceptions are initialized, and more specifically, a call to PyExceptionClass_Check() within PyErr_Object() reads a NULL pointer. I haven't tested this but I expect that this problem would not appear when calling Python directly since Python sanitizes the command line arguments from main(). Nonetheless even here the possibility of other exceptions being raised early in the initialization sequence remains a potential problem.
The interpreter isn't initialized, so calling PyErr_Format in a release build segfaults when it tries to dereference a NULL PyThreadState. OTOH, a debug build should call PyThreadState_Get, which in this case calls Py_FatalError and aborts the process. Unfortunately 3.5.0+ debug builds don't call PyThreadState_Get due to the fix for issue 25150.

> the possibility of other exceptions being raised early in the 
> initialization sequence remains a potential problem.

PEP 432 proposes a pre-initialization phase that sets a valid Python thread state.